File size: 3,293 Bytes
16a5de0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# 🧠 BERT-Spam-Job-Posting-Detection-Model

A BERT-based binary classifier fine-tuned to detect whether a job posting is **fake** or **real**. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements.

---

## ✨ Model Highlights

- πŸ“Œ Based on [`bert-base-uncased`](https://huggingface.co/bert-base-uncased)
- πŸ” Fine-tuned on a custom dataset of job postings labeled as fake or real
- ⚑ Binary classification: Fake Job Posting vs Real Job Posting
- πŸ’Ύ Lightweight and optimized for CPU and GPU inference

---

## 🧠 Intended Uses

- Automated detection of fraudulent job postings  
- Job board moderation and quality control  
- Enhancing recruitment platform security  
- Improving user trust in job marketplaces  
- Regulatory compliance monitoring for job ads

---

## 🚫 Limitations

- Trained primarily on English-language job postings  
- May underperform on postings from less-represented industries or regions  
- Not optimized for job descriptions longer than 128 tokens  
- Not suitable for multilingual or multimedia job posting content

---

## πŸ‹οΈβ€β™‚οΈ Training Details

| Field          | Value                         |
| -------------- | ----------------------------- |
| **Base Model** | `bert-base-uncased`           |
| **Dataset**    | Custom labeled job postings   |
| **Framework**  | PyTorch with Transformers  |
| **Epochs**     | 3                             |
| **Batch Size** | 16                            |
| **Max Length** | 128 tokens                    |
| **Optimizer**  | AdamW                        |
| **Loss**       | CrossEntropyLoss               |
| **Device**     | CUDA-enabled GPU              |

---

## πŸ“Š Evaluation Metrics

| Metric    | Score  |
| --------- | ------ |
| Accuracy  | 0.97   |
| Precision | 0.81   |

---

## πŸš€ Usage

```python
from transformers import BertTokenizerFast, BertForSequenceClassification
import torch

model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def predict_with_bert(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    device = next(model.parameters()).device  # Get model device (cpu or cuda)
    inputs = {k: v.to(device) for k, v in inputs.items()}  
    with torch.no_grad():
        logits = model(**inputs).logits

    predicted_class_id = logits.argmax().item()
    return "Fake Job" if predicted_class_id == 1 else "Real Job"

# Example
print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now."))
print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python."))
```
## πŸ—‚ Repository Structure
```
.
β”œβ”€β”€ model/               # Quantized model files
β”œβ”€β”€ tokenizer_config/    # Tokenizer and vocab files
β”œβ”€β”€ model.safensors/     # Fine-tuned model in safetensors format
β”œβ”€β”€ README.md            # Model card

```
---
## 🀝 Contributing
Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue.