vishal1364's picture
Create README.md
16a5de0 verified

🧠 BERT-Spam-Job-Posting-Detection-Model

A BERT-based binary classifier fine-tuned to detect whether a job posting is fake or real. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements.


✨ Model Highlights

  • πŸ“Œ Based on bert-base-uncased
  • πŸ” Fine-tuned on a custom dataset of job postings labeled as fake or real
  • ⚑ Binary classification: Fake Job Posting vs Real Job Posting
  • πŸ’Ύ Lightweight and optimized for CPU and GPU inference

🧠 Intended Uses

  • Automated detection of fraudulent job postings
  • Job board moderation and quality control
  • Enhancing recruitment platform security
  • Improving user trust in job marketplaces
  • Regulatory compliance monitoring for job ads

🚫 Limitations

  • Trained primarily on English-language job postings
  • May underperform on postings from less-represented industries or regions
  • Not optimized for job descriptions longer than 128 tokens
  • Not suitable for multilingual or multimedia job posting content

πŸ‹οΈβ€β™‚οΈ Training Details

Field Value
Base Model bert-base-uncased
Dataset Custom labeled job postings
Framework PyTorch with Transformers
Epochs 3
Batch Size 16
Max Length 128 tokens
Optimizer AdamW
Loss CrossEntropyLoss
Device CUDA-enabled GPU

πŸ“Š Evaluation Metrics

Metric Score
Accuracy 0.97
Precision 0.81

πŸš€ Usage

from transformers import BertTokenizerFast, BertForSequenceClassification
import torch

model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def predict_with_bert(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    device = next(model.parameters()).device  # Get model device (cpu or cuda)
    inputs = {k: v.to(device) for k, v in inputs.items()}  
    with torch.no_grad():
        logits = model(**inputs).logits

    predicted_class_id = logits.argmax().item()
    return "Fake Job" if predicted_class_id == 1 else "Real Job"

# Example
print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now."))
print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python."))

πŸ—‚ Repository Structure

.
β”œβ”€β”€ model/               # Quantized model files
β”œβ”€β”€ tokenizer_config/    # Tokenizer and vocab files
β”œβ”€β”€ model.safensors/     # Fine-tuned model in safetensors format
β”œβ”€β”€ README.md            # Model card

🀝 Contributing

Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue.