AventIQ-AI
/

named-entity-recognition-for-tagging-news-articles

Model card Files Files and versions Community

named-entity-recognition-for-tagging-news-articles / README.md

NikG100's picture

Upload 9 files

27cda1d verified 2 months ago

|

history blame contribute delete

2.65 kB

	# RoBERTa-Base Quantized Model for Named Entity Recognition (NER)

	This repository contains a quantized version of the RoBERTa model fine-tuned for Named Entity Recognition (NER) on the WikiANN (English) dataset. The model is particularly suitable for tagging named entities in news articles, such as persons, organizations, and locations. It has been optimized for efficient deployment using quantization techniques.


	## Model Details

	- Model Architecture: RoBERTa Base
	- Task: Named Entity Recognition
	- Dataset: WikiANN (English)
	- Use Case: Tagging news articles with named entities
	- Quantization: Float16
	- Fine-tuning Framework: Hugging Face Transformers

	## Usage

	### Installation

	```sh
	pip install transformers torch
	```


	### Loading the Model

	```python

	from transformers import RobertaTokenizerFast, RobertaForSequenceClassification, Trainer, TrainingArguments
	import torch



	# Load tokenizer

	tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")

	# Create NER pipeline
	ner_pipeline = pipeline(
	"ner",
	model=model,
	tokenizer=tokenizer,
	aggregation_strategy="simple"
	)

	# Sample news headline
	text = "Apple Inc. is planning to open a new campus in London by the end of 2025."

	# Inference
	entities = ner_pipeline(text)

	# Display results
	for ent in entities:
	print(f"{ent['word']}: {ent['entity_group']} ({ent['score']:.2f})")

	```

	## Performance Metrics

	- Accuracy: 0.923422
	- Precision: 0.923052
	- Recall: 0.923422
	- F1: 0.923150


	## Fine-Tuning Details

	### Dataset

	The dataset is taken from Hugging Face WikiANN (English).

	### Training

	- Number of epochs: 5

	- Batch size: 16

	- Evaluation strategy: epoch

	- Learning rate: 3e-5

	### Quantization

	Post-training quantization was applied using PyTorch's built-in quantization framework to reduce the model size and improve inference efficiency.

	## Repository Structure

	```

	.
	├── config.json
	├── tokenizer_config.json
	├── sepcial_tokens_map.json
	├── tokenizer.json
	├── model.safetensors # Fine Tuned Model
	├── README.md # Model documentation

	```

	## Limitations

	- The model may not generalize well to domains outside the fine-tuning dataset.

	- Quantization may result in minor accuracy degradation compared to full-precision models.

	## Contributing

	Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.