T5-Based Grammar Correction Model for Writing Assistance

This repository contains a fine-tuned T5-base transformer model designed for grammatical error correction, optimized for writing assistance applications. The model identifies and corrects grammar issues in English sentences to improve fluency, clarity, and correctness.

Model Details

Model Architecture: T5 (Text-to-Text Transfer Transformer)
Task: Grammar Correction
Domain: General English writing (academic, informal, learner-written)
Dataset: JFLEG (via Hugging Face Datasets)
Fine-tuning Framework: Hugging Face Transformers

Usage

Installation

pip install datasets transformers evaluate

Loading the Model

from transformers import T5Tokenizer, T5ForConditionalGeneration

model_name = "t5-base"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained("path/to/your/fine-tuned-model")

def correct_grammar(text, max_length=128):
    input_text = "fix: " + text
    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True).to(model.device)
    output_ids = model.generate(inputs['input_ids'], max_length=max_length)
    return tokenizer.decode(output_ids[0], skip_special_tokens=True)

Performance Metrics

ROUGE-1: 53.42
ROUGE-2: 28.76
ROUGE-L: 49.89
(Exact numbers depend on fine-tuning details)

Fine-Tuning Details

Dataset

The model was fine-tuned on the JFLEG dataset (via Hugging Face), a benchmark for fluency-based grammatical error correction. Each entry contains:

A source sentence written by English learners
One or more fluent reference corrections

The dataset was split into train and validation sets using a 90/10 split ratio.

Training Configuration

Epochs: 3
Batch Size: 8
Learning Rate: 3e-4
Evaluation Strategy: epoch
Model: t5-base
Max Input Length: 256
Max Target Length: 64

Repository Structure

.
├── config.json
├── tokenizer_config.json    
├── special_tokens_map.json 
├── tokenizer.json        
├── model.safetensors      # Fine-Tuned Model
├── README.md              # Model documentation

Limitations

May not generalize well to domain-specific text (e.g., legal, scientific).
Complex grammatical restructuring or rephrasing may be imperfect.
English only (based on JFLEG data).

Contributing

Contributions and suggestions are welcome! Please open an issue or submit a pull request for improvements or extensions (e.g., multilingual support, UI integration).