File size: 3,293 Bytes
16a5de0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# π§ BERT-Spam-Job-Posting-Detection-Model
A BERT-based binary classifier fine-tuned to detect whether a job posting is **fake** or **real**. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements.
---
## β¨ Model Highlights
- π Based on [`bert-base-uncased`](https://huggingface.co/bert-base-uncased)
- π Fine-tuned on a custom dataset of job postings labeled as fake or real
- β‘ Binary classification: Fake Job Posting vs Real Job Posting
- πΎ Lightweight and optimized for CPU and GPU inference
---
## π§ Intended Uses
- Automated detection of fraudulent job postings
- Job board moderation and quality control
- Enhancing recruitment platform security
- Improving user trust in job marketplaces
- Regulatory compliance monitoring for job ads
---
## π« Limitations
- Trained primarily on English-language job postings
- May underperform on postings from less-represented industries or regions
- Not optimized for job descriptions longer than 128 tokens
- Not suitable for multilingual or multimedia job posting content
---
## ποΈββοΈ Training Details
| Field | Value |
| -------------- | ----------------------------- |
| **Base Model** | `bert-base-uncased` |
| **Dataset** | Custom labeled job postings |
| **Framework** | PyTorch with Transformers |
| **Epochs** | 3 |
| **Batch Size** | 16 |
| **Max Length** | 128 tokens |
| **Optimizer** | AdamW |
| **Loss** | CrossEntropyLoss |
| **Device** | CUDA-enabled GPU |
---
## π Evaluation Metrics
| Metric | Score |
| --------- | ------ |
| Accuracy | 0.97 |
| Precision | 0.81 |
---
## π Usage
```python
from transformers import BertTokenizerFast, BertForSequenceClassification
import torch
model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def predict_with_bert(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
device = next(model.parameters()).device # Get model device (cpu or cuda)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
return "Fake Job" if predicted_class_id == 1 else "Real Job"
# Example
print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now."))
print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python."))
```
## π Repository Structure
```
.
βββ model/ # Quantized model files
βββ tokenizer_config/ # Tokenizer and vocab files
βββ model.safensors/ # Fine-tuned model in safetensors format
βββ README.md # Model card
```
---
## π€ Contributing
Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue. |