YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π§ NER-BERT-AI-Model-using-annotated-corpus-ner
A BERT-based Named Entity Recognition (NER) model fine-tuned on the Entity Annotated Corpus. It classifies tokens in text into predefined entity types such as Person (PER), Organization (ORG), and Location (LOC). This model is well-suited for information extraction, resume parsing, and chatbot applications.
β¨ Model Highlights
- π Based on
bert-base-cased
(by Google) - π Fine-tuned on the Entity Annotated Corpus (
ner_dataset.csv
) - β‘ Supports prediction of 3 entity types: PER, ORG, LOC
- πΎ Compatible with Hugging Face
pipeline()
for easy inference
π§ Intended Uses
- Resume and document parsing
- Chatbots and virtual assistants
- Named entity tagging in structured documents
- Search and information retrieval systems
- News or content analysis
π« Limitations
- Trained only on English formal texts
- May not generalize well to informal text or domain-specific jargon
- Subword tokenization may split entities (e.g., "Cupertino" β "Cup", "##ert", "##ino")
- Limited to the entities available in the original dataset (PER, ORG, LOC only)
ποΈββοΈ Training Details
Field | Value |
---|---|
Base Model | bert-base-cased |
Dataset | Entity Annotated Corpus |
Framework | PyTorch with Transformers |
Epochs | 3 |
Batch Size | 16 |
Max Length | 128 tokens |
Optimizer | AdamW |
Loss | CrossEntropyLoss (token-level) |
Device | Trained on CUDA-enabled GPU |
π Evaluation Metrics
Metric | Score |
---|---|
Precision | 83.15 |
Recall | 83.85 |
F1-Score | 83.50 |
π Label Mapping
Label ID | Entity Type |
---|---|
0 | O |
1 | B-PER |
2 | I-PER |
3 | B-ORG |
4 | I-ORG |
5 | B-LOC |
6 | I-LOC |
π Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
model_name = "/AventIQ-AI/NER-BERT-AI-Model-using-annotated-corpus-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Wolfgang and I live in Berlin"
ner_results = nlp(example)
print(ner_results)
π§© Quantization
Post-training quantization can be applied using PyTorch to reduce model size and improve inference performance, especially on edge devices.
π Repository Structure
.
βββ model/ # Trained model files
βββ tokenizer_config/ # Tokenizer and vocab files
βββ model.safensors/ # Model in safetensors format
βββ README.md # Model card
π€ Contributing
We welcome feedback, bug reports, and improvements! Feel free to open an issue or submit a pull request.
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support