Token Classification
Transformers
Safetensors
English
bert
ner
named-entity-recognition
text-classification
sequence-labeling
transformer
nlp
pretrained-model
dataset-finetuning
deep-learning
huggingface
conll2025
real-time-inference
efficient-nlp
high-accuracy
gpu-optimized
chatbot
information-extraction
search-enhancement
knowledge-graph
legal-nlp
medical-nlp
financial-nlp
Update README.md
Browse files
README.md
CHANGED
@@ -44,12 +44,12 @@ base_model:
|
|
44 |
|
45 |

|
46 |
|
47 |
-
# π EntityBERT
|
48 |
|
49 |
## π Model Details
|
50 |
|
51 |
### π Description
|
52 |
-
The `boltuix/EntityBERT
|
53 |
|
54 |
- **Dataset**: [boltuix/conll2025-ner](https://huggingface.co/datasets/boltuix/conll2025-ner) (~143,709 entries, 6.38 MB)
|
55 |
- **Entity Types**: 36 NER tags (18 entity categories with B-/I- tags + O)
|
@@ -68,7 +68,7 @@ The `boltuix/EntityBERT-NER` model is a fine-tuned transformer for **Named Entit
|
|
68 |
- **Parameters**: ~11M
|
69 |
|
70 |
### π Links
|
71 |
-
- **Model Repository**: [boltuix/EntityBERT
|
72 |
- **Dataset**: [boltuix/conll2025-ner](#download-instructions)
|
73 |
- **Hugging Face Docs**: [Transformers](https://huggingface.co/docs/transformers)
|
74 |
- **Demo**: Available at [boltuix.github.io/demo](https://boltuix.github.io/demo) (coming soon)
|
@@ -102,8 +102,8 @@ Use the model for NER with the following Python code:
|
|
102 |
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
|
103 |
|
104 |
# Load model and tokenizer
|
105 |
-
tokenizer = AutoTokenizer.from_pretrained("boltuix/EntityBERT
|
106 |
-
model = AutoModelForTokenClassification.from_pretrained("boltuix/EntityBERT
|
107 |
|
108 |
# Create NER pipeline
|
109 |
nlp = pipeline("token-classification", model=model, tokenizer=tokenizer)
|
@@ -231,7 +231,7 @@ These high scores demonstrate the modelβs ability to accurately identify entit
|
|
231 |
|
232 |
## π§ Training the Model
|
233 |
|
234 |
-
Fine-tune the `boltuix/bert-mini` model on the `boltuix/conll2025-ner` dataset to replicate or extend `EntityBERT
|
235 |
|
236 |
```python
|
237 |
# Install dependencies
|
@@ -296,7 +296,7 @@ model = AutoModelForTokenClassification.from_pretrained("boltuix/bert-mini", num
|
|
296 |
|
297 |
# Training arguments
|
298 |
args = TrainingArguments(
|
299 |
-
output_dir="boltuix/
|
300 |
eval_strategy="epoch",
|
301 |
learning_rate=2e-5,
|
302 |
per_device_train_batch_size=16,
|
@@ -347,8 +347,8 @@ trainer = Trainer(
|
|
347 |
trainer.train()
|
348 |
|
349 |
# Save model
|
350 |
-
trainer.save_model("boltuix/
|
351 |
-
tokenizer.save_pretrained("boltuix/
|
352 |
```
|
353 |
|
354 |
### π οΈ Tips
|
@@ -381,7 +381,7 @@ pip install transformers torch pandas pyarrow seqeval
|
|
381 |
- **Optional**: NVIDIA CUDA for GPU acceleration
|
382 |
|
383 |
### Download Instructions π₯
|
384 |
-
- **Model**: [boltuix/EntityBERT
|
385 |
- **Dataset**: [boltuix/conll2025-ner](https://huggingface.co/datasets/boltuix/conll2025-ner)
|
386 |
- Load with Hugging Face `datasets` or pandas.
|
387 |
|
@@ -394,7 +394,7 @@ Evaluate the model on custom data:
|
|
394 |
from transformers import pipeline
|
395 |
|
396 |
# Load NER pipeline
|
397 |
-
nlp = pipeline("token-classification", model="boltuix/EntityBERT
|
398 |
|
399 |
# Test data
|
400 |
text = "Book a Lyft from Metropolis on December 1, 2025, contact [email protected]."
|
@@ -468,7 +468,7 @@ plt.show()
|
|
468 |
## βοΈ Comparison to Other Models
|
469 |
| Model | Dataset | Parameters | F1 Score | Size |
|
470 |
|----------------------|--------------------|------------|----------|--------|
|
471 |
-
| **EntityBERT
|
472 |
| BERT-base-NER | CoNLL-2003 | ~110M | ~0.89 | ~400 MB|
|
473 |
| DistilBERT-NER | CoNLL-2003 | ~66M | ~0.85 | ~200 MB|
|
474 |
|
@@ -481,7 +481,7 @@ plt.show()
|
|
481 |
|
482 |
## π Community and Support
|
483 |
- π Explore: [Hugging Face Community](https://huggingface.co/community)
|
484 |
-
- π οΈ Contribute: [boltuix/EntityBERT
|
485 |
- π¬ Discuss: [Hugging Face Forums](https://huggingface.co/discussions)
|
486 |
- π Learn: [Transformers Docs](https://huggingface.co/docs/transformers)
|
487 |
- π§ Contact: Boltuix at [[email protected]](mailto:[email protected])
|
|
|
44 |
|
45 |

|
46 |
|
47 |
+
# π EntityBERT Model π
|
48 |
|
49 |
## π Model Details
|
50 |
|
51 |
### π Description
|
52 |
+
The `boltuix/EntityBERT` model is a fine-tuned transformer for **Named Entity Recognition (NER)**, built on the lightweight `boltuix/bert-mini` base model. It excels at identifying 36 entity types, including people, locations, organizations, dates, times, phone numbers, emails, URLs, and more, in English text. Designed for efficiency and high accuracy, itβs perfect for real-time applications like information extraction, chatbots, and knowledge graph construction across domains such as travel, medical, logistics, and education.
|
53 |
|
54 |
- **Dataset**: [boltuix/conll2025-ner](https://huggingface.co/datasets/boltuix/conll2025-ner) (~143,709 entries, 6.38 MB)
|
55 |
- **Entity Types**: 36 NER tags (18 entity categories with B-/I- tags + O)
|
|
|
68 |
- **Parameters**: ~11M
|
69 |
|
70 |
### π Links
|
71 |
+
- **Model Repository**: [boltuix/EntityBERT](https://huggingface.co/boltuix/EntityBERT)
|
72 |
- **Dataset**: [boltuix/conll2025-ner](#download-instructions)
|
73 |
- **Hugging Face Docs**: [Transformers](https://huggingface.co/docs/transformers)
|
74 |
- **Demo**: Available at [boltuix.github.io/demo](https://boltuix.github.io/demo) (coming soon)
|
|
|
102 |
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
|
103 |
|
104 |
# Load model and tokenizer
|
105 |
+
tokenizer = AutoTokenizer.from_pretrained("boltuix/EntityBERT")
|
106 |
+
model = AutoModelForTokenClassification.from_pretrained("boltuix/EntityBERT")
|
107 |
|
108 |
# Create NER pipeline
|
109 |
nlp = pipeline("token-classification", model=model, tokenizer=tokenizer)
|
|
|
231 |
|
232 |
## π§ Training the Model
|
233 |
|
234 |
+
Fine-tune the `boltuix/bert-mini` model on the `boltuix/conll2025-ner` dataset to replicate or extend `EntityBERT`. Below is a training script:
|
235 |
|
236 |
```python
|
237 |
# Install dependencies
|
|
|
296 |
|
297 |
# Training arguments
|
298 |
args = TrainingArguments(
|
299 |
+
output_dir="boltuix/EntityBERT",
|
300 |
eval_strategy="epoch",
|
301 |
learning_rate=2e-5,
|
302 |
per_device_train_batch_size=16,
|
|
|
347 |
trainer.train()
|
348 |
|
349 |
# Save model
|
350 |
+
trainer.save_model("boltuix/EntityBERT")
|
351 |
+
tokenizer.save_pretrained("boltuix/EntityBERT")
|
352 |
```
|
353 |
|
354 |
### π οΈ Tips
|
|
|
381 |
- **Optional**: NVIDIA CUDA for GPU acceleration
|
382 |
|
383 |
### Download Instructions π₯
|
384 |
+
- **Model**: [boltuix/EntityBERT](https://huggingface.co/boltuix/EntityBERT)
|
385 |
- **Dataset**: [boltuix/conll2025-ner](https://huggingface.co/datasets/boltuix/conll2025-ner)
|
386 |
- Load with Hugging Face `datasets` or pandas.
|
387 |
|
|
|
394 |
from transformers import pipeline
|
395 |
|
396 |
# Load NER pipeline
|
397 |
+
nlp = pipeline("token-classification", model="boltuix/EntityBERT")
|
398 |
|
399 |
# Test data
|
400 |
text = "Book a Lyft from Metropolis on December 1, 2025, contact [email protected]."
|
|
|
468 |
## βοΈ Comparison to Other Models
|
469 |
| Model | Dataset | Parameters | F1 Score | Size |
|
470 |
|----------------------|--------------------|------------|----------|--------|
|
471 |
+
| **EntityBERT** | conll2025-ner | ~11M | 0.89 | ~50 MB |
|
472 |
| BERT-base-NER | CoNLL-2003 | ~110M | ~0.89 | ~400 MB|
|
473 |
| DistilBERT-NER | CoNLL-2003 | ~66M | ~0.85 | ~200 MB|
|
474 |
|
|
|
481 |
|
482 |
## π Community and Support
|
483 |
- π Explore: [Hugging Face Community](https://huggingface.co/community)
|
484 |
+
- π οΈ Contribute: [boltuix/EntityBERT](https://huggingface.co/boltuix/EntityBERT)
|
485 |
- π¬ Discuss: [Hugging Face Forums](https://huggingface.co/discussions)
|
486 |
- π Learn: [Transformers Docs](https://huggingface.co/docs/transformers)
|
487 |
- π§ Contact: Boltuix at [[email protected]](mailto:[email protected])
|