vishal1364 commited on
Commit
16a5de0
Β·
verified Β·
1 Parent(s): c0789e8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 BERT-Spam-Job-Posting-Detection-Model
2
+
3
+ A BERT-based binary classifier fine-tuned to detect whether a job posting is **fake** or **real**. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements.
4
+
5
+ ---
6
+
7
+ ## ✨ Model Highlights
8
+
9
+ - πŸ“Œ Based on [`bert-base-uncased`](https://huggingface.co/bert-base-uncased)
10
+ - πŸ” Fine-tuned on a custom dataset of job postings labeled as fake or real
11
+ - ⚑ Binary classification: Fake Job Posting vs Real Job Posting
12
+ - πŸ’Ύ Lightweight and optimized for CPU and GPU inference
13
+
14
+ ---
15
+
16
+ ## 🧠 Intended Uses
17
+
18
+ - Automated detection of fraudulent job postings
19
+ - Job board moderation and quality control
20
+ - Enhancing recruitment platform security
21
+ - Improving user trust in job marketplaces
22
+ - Regulatory compliance monitoring for job ads
23
+
24
+ ---
25
+
26
+ ## 🚫 Limitations
27
+
28
+ - Trained primarily on English-language job postings
29
+ - May underperform on postings from less-represented industries or regions
30
+ - Not optimized for job descriptions longer than 128 tokens
31
+ - Not suitable for multilingual or multimedia job posting content
32
+
33
+ ---
34
+
35
+ ## πŸ‹οΈβ€β™‚οΈ Training Details
36
+
37
+ | Field | Value |
38
+ | -------------- | ----------------------------- |
39
+ | **Base Model** | `bert-base-uncased` |
40
+ | **Dataset** | Custom labeled job postings |
41
+ | **Framework** | PyTorch with Transformers |
42
+ | **Epochs** | 3 |
43
+ | **Batch Size** | 16 |
44
+ | **Max Length** | 128 tokens |
45
+ | **Optimizer** | AdamW |
46
+ | **Loss** | CrossEntropyLoss |
47
+ | **Device** | CUDA-enabled GPU |
48
+
49
+ ---
50
+
51
+ ## πŸ“Š Evaluation Metrics
52
+
53
+ | Metric | Score |
54
+ | --------- | ------ |
55
+ | Accuracy | 0.97 |
56
+ | Precision | 0.81 |
57
+
58
+ ---
59
+
60
+ ## πŸš€ Usage
61
+
62
+ ```python
63
+ from transformers import BertTokenizerFast, BertForSequenceClassification
64
+ import torch
65
+
66
+ model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model"
67
+ tokenizer = BertTokenizerFast.from_pretrained(model_name)
68
+ model = BertForSequenceClassification.from_pretrained(model_name)
69
+ model.eval()
70
+
71
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
72
+ model.to(device)
73
+
74
+ def predict_with_bert(text):
75
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
76
+ device = next(model.parameters()).device # Get model device (cpu or cuda)
77
+ inputs = {k: v.to(device) for k, v in inputs.items()}
78
+ with torch.no_grad():
79
+ logits = model(**inputs).logits
80
+
81
+ predicted_class_id = logits.argmax().item()
82
+ return "Fake Job" if predicted_class_id == 1 else "Real Job"
83
+
84
+ # Example
85
+ print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now."))
86
+ print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python."))
87
+ ```
88
+ ## πŸ—‚ Repository Structure
89
+ ```
90
+ .
91
+ β”œβ”€β”€ model/ # Quantized model files
92
+ β”œβ”€β”€ tokenizer_config/ # Tokenizer and vocab files
93
+ β”œβ”€β”€ model.safensors/ # Fine-tuned model in safetensors format
94
+ β”œβ”€β”€ README.md # Model card
95
+
96
+ ```
97
+ ---
98
+ ## 🀝 Contributing
99
+ Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue.