vishal1364's picture
Create README.md
16a5de0 verified
# 🧠 BERT-Spam-Job-Posting-Detection-Model
A BERT-based binary classifier fine-tuned to detect whether a job posting is **fake** or **real**. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements.
---
## ✨ Model Highlights
- πŸ“Œ Based on [`bert-base-uncased`](https://huggingface.co/bert-base-uncased)
- πŸ” Fine-tuned on a custom dataset of job postings labeled as fake or real
- ⚑ Binary classification: Fake Job Posting vs Real Job Posting
- πŸ’Ύ Lightweight and optimized for CPU and GPU inference
---
## 🧠 Intended Uses
- Automated detection of fraudulent job postings
- Job board moderation and quality control
- Enhancing recruitment platform security
- Improving user trust in job marketplaces
- Regulatory compliance monitoring for job ads
---
## 🚫 Limitations
- Trained primarily on English-language job postings
- May underperform on postings from less-represented industries or regions
- Not optimized for job descriptions longer than 128 tokens
- Not suitable for multilingual or multimedia job posting content
---
## πŸ‹οΈβ€β™‚οΈ Training Details
| Field | Value |
| -------------- | ----------------------------- |
| **Base Model** | `bert-base-uncased` |
| **Dataset** | Custom labeled job postings |
| **Framework** | PyTorch with Transformers |
| **Epochs** | 3 |
| **Batch Size** | 16 |
| **Max Length** | 128 tokens |
| **Optimizer** | AdamW |
| **Loss** | CrossEntropyLoss |
| **Device** | CUDA-enabled GPU |
---
## πŸ“Š Evaluation Metrics
| Metric | Score |
| --------- | ------ |
| Accuracy | 0.97 |
| Precision | 0.81 |
---
## πŸš€ Usage
```python
from transformers import BertTokenizerFast, BertForSequenceClassification
import torch
model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def predict_with_bert(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
device = next(model.parameters()).device # Get model device (cpu or cuda)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
return "Fake Job" if predicted_class_id == 1 else "Real Job"
# Example
print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now."))
print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python."))
```
## πŸ—‚ Repository Structure
```
.
β”œβ”€β”€ model/ # Quantized model files
β”œβ”€β”€ tokenizer_config/ # Tokenizer and vocab files
β”œβ”€β”€ model.safensors/ # Fine-tuned model in safetensors format
β”œβ”€β”€ README.md # Model card
```
---
## 🀝 Contributing
Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue.