π§ BERT-Spam-Job-Posting-Detection-Model
A BERT-based binary classifier fine-tuned to detect whether a job posting is fake or real. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements.
β¨ Model Highlights
- π Based on
bert-base-uncased
- π Fine-tuned on a custom dataset of job postings labeled as fake or real
- β‘ Binary classification: Fake Job Posting vs Real Job Posting
- πΎ Lightweight and optimized for CPU and GPU inference
π§ Intended Uses
- Automated detection of fraudulent job postings
- Job board moderation and quality control
- Enhancing recruitment platform security
- Improving user trust in job marketplaces
- Regulatory compliance monitoring for job ads
π« Limitations
- Trained primarily on English-language job postings
- May underperform on postings from less-represented industries or regions
- Not optimized for job descriptions longer than 128 tokens
- Not suitable for multilingual or multimedia job posting content
ποΈββοΈ Training Details
Field | Value |
---|---|
Base Model | bert-base-uncased |
Dataset | Custom labeled job postings |
Framework | PyTorch with Transformers |
Epochs | 3 |
Batch Size | 16 |
Max Length | 128 tokens |
Optimizer | AdamW |
Loss | CrossEntropyLoss |
Device | CUDA-enabled GPU |
π Evaluation Metrics
Metric | Score |
---|---|
Accuracy | 0.97 |
Precision | 0.81 |
π Usage
from transformers import BertTokenizerFast, BertForSequenceClassification
import torch
model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def predict_with_bert(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
device = next(model.parameters()).device # Get model device (cpu or cuda)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
return "Fake Job" if predicted_class_id == 1 else "Real Job"
# Example
print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now."))
print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python."))
π Repository Structure
.
βββ model/ # Quantized model files
βββ tokenizer_config/ # Tokenizer and vocab files
βββ model.safensors/ # Fine-tuned model in safetensors format
βββ README.md # Model card
π€ Contributing
Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue.