YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🧠 NER-BERT-AI-Model-using-annotated-corpus-ner

A BERT-based Named Entity Recognition (NER) model fine-tuned on the Entity Annotated Corpus. It classifies tokens in text into predefined entity types such as Person (PER), Organization (ORG), and Location (LOC). This model is well-suited for information extraction, resume parsing, and chatbot applications.


✨ Model Highlights

  • πŸ“Œ Based on bert-base-cased (by Google)
  • πŸ” Fine-tuned on the Entity Annotated Corpus (ner_dataset.csv)
  • ⚑ Supports prediction of 3 entity types: PER, ORG, LOC
  • πŸ’Ύ Compatible with Hugging Face pipeline() for easy inference

🧠 Intended Uses

  • Resume and document parsing
  • Chatbots and virtual assistants
  • Named entity tagging in structured documents
  • Search and information retrieval systems
  • News or content analysis

🚫 Limitations

  • Trained only on English formal texts
  • May not generalize well to informal text or domain-specific jargon
  • Subword tokenization may split entities (e.g., "Cupertino" β†’ "Cup", "##ert", "##ino")
  • Limited to the entities available in the original dataset (PER, ORG, LOC only)

πŸ‹οΈβ€β™‚οΈ Training Details

Field Value
Base Model bert-base-cased
Dataset Entity Annotated Corpus
Framework PyTorch with Transformers
Epochs 3
Batch Size 16
Max Length 128 tokens
Optimizer AdamW
Loss CrossEntropyLoss (token-level)
Device Trained on CUDA-enabled GPU

πŸ“Š Evaluation Metrics

Metric Score
Precision 83.15
Recall 83.85
F1-Score 83.50

πŸ”Ž Label Mapping

Label ID Entity Type
0 O
1 B-PER
2 I-PER
3 B-ORG
4 I-ORG
5 B-LOC
6 I-LOC

πŸš€ Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

model_name = "/AventIQ-AI/NER-BERT-AI-Model-using-annotated-corpus-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)
print(ner_results)

🧩 Quantization

Post-training quantization can be applied using PyTorch to reduce model size and improve inference performance, especially on edge devices.

πŸ—‚ Repository Structure

.
β”œβ”€β”€ model/               # Trained model files
β”œβ”€β”€ tokenizer_config/    # Tokenizer and vocab files
β”œβ”€β”€ model.safensors/     # Model in safetensors format
β”œβ”€β”€ README.md            # Model card

🀝 Contributing

We welcome feedback, bug reports, and improvements! Feel free to open an issue or submit a pull request.

Downloads last month
2
Safetensors
Model size
108M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support