vishal1364 commited on
Commit
d80e21b
Β·
verified Β·
1 Parent(s): e365691

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -0
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 NER-BERT-AI-Model-using-annotated-corpus-ner
2
+
3
+ A BERT-based Named Entity Recognition (NER) model fine-tuned on the Entity Annotated Corpus. It classifies tokens in text into predefined entity types such as Person (PER), Organization (ORG), and Location (LOC). This model is well-suited for information extraction, resume parsing, and chatbot applications.
4
+
5
+ ---
6
+
7
+ ## ✨ Model Highlights
8
+
9
+ - πŸ“Œ Based on `bert-base-cased` (by Google)
10
+ - πŸ” Fine-tuned on the Entity Annotated Corpus (`ner_dataset.csv`)
11
+ - ⚑ Supports prediction of 3 entity types: PER, ORG, LOC
12
+ - πŸ’Ύ Compatible with Hugging Face `pipeline()` for easy inference
13
+
14
+ ---
15
+
16
+ ## 🧠 Intended Uses
17
+
18
+ - Resume and document parsing
19
+ - Chatbots and virtual assistants
20
+ - Named entity tagging in structured documents
21
+ - Search and information retrieval systems
22
+ - News or content analysis
23
+
24
+ ---
25
+
26
+ ## 🚫 Limitations
27
+
28
+ - Trained only on English formal texts
29
+ - May not generalize well to informal text or domain-specific jargon
30
+ - Subword tokenization may split entities (e.g., "Cupertino" β†’ "Cup", "##ert", "##ino")
31
+ - Limited to the entities available in the original dataset (PER, ORG, LOC only)
32
+
33
+ ---
34
+
35
+ ## πŸ‹οΈβ€β™‚οΈ Training Details
36
+
37
+ | Field | Value |
38
+ |---------------|------------------------------|
39
+ | Base Model | `bert-base-cased` |
40
+ | Dataset | Entity Annotated Corpus |
41
+ | Framework | PyTorch with Transformers |
42
+ | Epochs | 3 |
43
+ | Batch Size | 16 |
44
+ | Max Length | 128 tokens |
45
+ | Optimizer | AdamW |
46
+ | Loss | CrossEntropyLoss (token-level) |
47
+ | Device | Trained on CUDA-enabled GPU |
48
+
49
+ ---
50
+
51
+ ## πŸ“Š Evaluation Metrics
52
+
53
+ | Metric | Score |
54
+ |-----------|-------|
55
+ | Precision | 83.15 |
56
+ | Recall | 83.85 |
57
+ | F1-Score | 83.50 |
58
+
59
+
60
+ ---
61
+
62
+ ## πŸ”Ž Label Mapping
63
+
64
+ | Label ID | Entity Type |
65
+ |----------|--------------|
66
+ | 0 | O |
67
+ | 1 | B-PER |
68
+ | 2 | I-PER |
69
+ | 3 | B-ORG |
70
+ | 4 | I-ORG |
71
+ | 5 | B-LOC |
72
+ | 6 | I-LOC |
73
+
74
+ ---
75
+
76
+ ## πŸš€ Usage
77
+
78
+ ```python
79
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
80
+ from transformers import pipeline
81
+
82
+ model_name = "/AventIQ-AI/NER-BERT-AI-Model-using-annotated-corpus-ner"
83
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
84
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
85
+
86
+ nlp = pipeline("ner", model=model, tokenizer=tokenizer)
87
+ example = "My name is Wolfgang and I live in Berlin"
88
+
89
+ ner_results = nlp(example)
90
+ print(ner_results)
91
+
92
+ ```
93
+ ## 🧩 Quantization
94
+ Post-training quantization can be applied using PyTorch to reduce model size and improve inference performance, especially on edge devices.
95
+
96
+ ## πŸ—‚ Repository Structure
97
+
98
+ ```
99
+ .
100
+ β”œβ”€β”€ model/ # Trained model files
101
+ β”œβ”€β”€ tokenizer_config/ # Tokenizer and vocab files
102
+ β”œβ”€β”€ model.safensors/ # Model in safetensors format
103
+ β”œβ”€β”€ README.md # Model card
104
+ ```
105
+ ## 🀝 Contributing
106
+ We welcome feedback, bug reports, and improvements!
107
+ Feel free to open an issue or submit a pull request.