BART-Based Text Summarization Model for News Aggregation
This repository hosts a BART transformer model fine-tuned for abstractive text summarization of news articles. It is designed to condense lengthy news reports into concise, informative summaries, enhancing user experience for news readers and aggregators.
Model Details
- Model Architecture: BART (Facebook's BART-base)
- Task: Abstractive Text Summarization
- Domain: News Articles
- Dataset: Reddit-TIFU (Hugging Face Datasets)
- Fine-tuning Framework: Hugging Face Transformers
Usage
Installation
pip install datasets transformers rouge-score evaluate
Loading the Model
from transformers import BartTokenizer, BartForConditionalGeneration, Trainer, TrainingArguments, DataCollatorForSeq2Seq
import torch
# Load tokenizer and model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_name = "facebook/bart-base"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name).to(device)
Performance Metrics
- Rouge1 : 25.500000
- Rouge2 : 7.860000
- Rougel : 20.640000
- Rougelsum : 21.180000
Fine-Tuning Details
Dataset
The dataset is sourced from Hugging Faceβs Reddit-TIFU dataset. It contains 79,000 reddit post and their summaries. The original training and testing sets were merged, shuffled, and re-split using an 90/10 ratio.
Training Configuration
- Epochs: 3
- Batch Size: 8
- Learning Rate: 2e-5
- Evaluation Strategy: epoch
Quantization
Post-training quantization was applied using PyTorch's built-in quantization framework to reduce the model size and improve inference efficiency.
Repository Structure
.
βββ config.json
βββ tokenizer_config.json
βββ sepcial_tokens_map.json
βββ tokenizer.json
βββ model.safetensors # Fine Tuned Model
βββ README.md # Model documentation
Limitations
The model may not generalize well to domains outside the fine-tuning dataset.
Quantization may result in minor accuracy degradation compared to full-precision models.
Contributing
Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.
- Downloads last month
- 3