File size: 3,293 Bytes

16a5de0

# 🧠 BERT-Spam-Job-Posting-Detection-Model

A BERT-based binary classifier fine-tuned to detect whether a job posting is **fake** or **real**. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements.

---

## ✨ Model Highlights

- 📌 Based on [`bert-base-uncased`](https://huggingface.co/bert-base-uncased)
- 🔍 Fine-tuned on a custom dataset of job postings labeled as fake or real
- ⚡ Binary classification: Fake Job Posting vs Real Job Posting
- 💾 Lightweight and optimized for CPU and GPU inference

---

## 🧠 Intended Uses

- Automated detection of fraudulent job postings  
- Job board moderation and quality control  
- Enhancing recruitment platform security  
- Improving user trust in job marketplaces  
- Regulatory compliance monitoring for job ads

---

## 🚫 Limitations

- Trained primarily on English-language job postings  
- May underperform on postings from less-represented industries or regions  
- Not optimized for job descriptions longer than 128 tokens  
- Not suitable for multilingual or multimedia job posting content

---

## 🏋️‍♂️ Training Details

| Field          | Value                         |
| -------------- | ----------------------------- |
| **Base Model** | `bert-base-uncased`           |
| **Dataset**    | Custom labeled job postings   |
| **Framework**  | PyTorch with Transformers  |
| **Epochs**     | 3                             |
| **Batch Size** | 16                            |
| **Max Length** | 128 tokens                    |
| **Optimizer**  | AdamW                        |
| **Loss**       | CrossEntropyLoss               |
| **Device**     | CUDA-enabled GPU              |

---

## 📊 Evaluation Metrics

| Metric    | Score  |
| --------- | ------ |
| Accuracy  | 0.97   |
| Precision | 0.81   |

---

## 🚀 Usage

```python
from transformers import BertTokenizerFast, BertForSequenceClassification
import torch

model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def predict_with_bert(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    device = next(model.parameters()).device  # Get model device (cpu or cuda)
    inputs = {k: v.to(device) for k, v in inputs.items()}  
    with torch.no_grad():
        logits = model(**inputs).logits

    predicted_class_id = logits.argmax().item()
    return "Fake Job" if predicted_class_id == 1 else "Real Job"

# Example
print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now."))
print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python."))
```
## 🗂 Repository Structure
```
.
├── model/               # Quantized model files
├── tokenizer_config/    # Tokenizer and vocab files
├── model.safensors/     # Fine-tuned model in safetensors format
├── README.md            # Model card

```
---
## 🤝 Contributing
Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue.