BART-Based Text Summarization Model for News Aggregation

This repository hosts a BART transformer model fine-tuned for abstractive text summarization of news articles. It is designed to condense lengthy news reports into concise, informative summaries, enhancing user experience for news readers and aggregators.

Model Details

Model Architecture: BART (Facebook's BART-base)
Task: Abstractive Text Summarization
Domain: News Articles
Dataset: Reddit-TIFU (Hugging Face Datasets)
Fine-tuning Framework: Hugging Face Transformers

Usage

Installation

pip install datasets transformers rouge-score evaluate

Loading the Model

from transformers import BartTokenizer, BartForConditionalGeneration, Trainer, TrainingArguments, DataCollatorForSeq2Seq
import torch

# Load tokenizer and model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_name = "facebook/bart-base"  
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name).to(device)

Performance Metrics

Rouge1 : 25.500000
Rouge2 : 7.860000
Rougel : 20.640000
Rougelsum : 21.180000

Fine-Tuning Details

Dataset

The dataset is sourced from Hugging Face’s Reddit-TIFU dataset. It contains 79,000 reddit post and their summaries. The original training and testing sets were merged, shuffled, and re-split using an 90/10 ratio.

Training Configuration

Epochs: 3
Batch Size: 8
Learning Rate: 2e-5
Evaluation Strategy: epoch

Quantization

Post-training quantization was applied using PyTorch's built-in quantization framework to reduce the model size and improve inference efficiency.

Repository Structure

.
├── config.json
├── tokenizer_config.json    
├── sepcial_tokens_map.json 
├── tokenizer.json        
├── model.safetensors    # Fine Tuned Model
├── README.md            # Model documentation

Limitations

The model may not generalize well to domains outside the fine-tuning dataset.
Quantization may result in minor accuracy degradation compared to full-precision models.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.