YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
T5-Based Grammar Correction Model for Writing Assistance
This repository contains a fine-tuned T5-base transformer model designed for grammatical error correction, optimized for writing assistance applications. The model identifies and corrects grammar issues in English sentences to improve fluency, clarity, and correctness.
Model Details
- Model Architecture: T5 (Text-to-Text Transfer Transformer)
- Task: Grammar Correction
- Domain: General English writing (academic, informal, learner-written)
- Dataset: JFLEG (via Hugging Face Datasets)
- Fine-tuning Framework: Hugging Face Transformers
Usage
Installation
pip install datasets transformers evaluate
Loading the Model
from transformers import T5Tokenizer, T5ForConditionalGeneration
model_name = "t5-base"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained("path/to/your/fine-tuned-model")
def correct_grammar(text, max_length=128):
input_text = "fix: " + text
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True).to(model.device)
output_ids = model.generate(inputs['input_ids'], max_length=max_length)
return tokenizer.decode(output_ids[0], skip_special_tokens=True)
Performance Metrics
- ROUGE-1: 53.42
- ROUGE-2: 28.76
- ROUGE-L: 49.89
- (Exact numbers depend on fine-tuning details)
Fine-Tuning Details
Dataset
The model was fine-tuned on the JFLEG dataset (via Hugging Face), a benchmark for fluency-based grammatical error correction. Each entry contains:
- A source sentence written by English learners
- One or more fluent reference corrections
The dataset was split into train and validation sets using a 90/10 split ratio.
Training Configuration
- Epochs: 3
- Batch Size: 8
- Learning Rate: 3e-4
- Evaluation Strategy:
epoch
- Model:
t5-base
- Max Input Length: 256
- Max Target Length: 64
Repository Structure
.
βββ config.json
βββ tokenizer_config.json
βββ special_tokens_map.json
βββ tokenizer.json
βββ model.safetensors # Fine-Tuned Model
βββ README.md # Model documentation
Limitations
- May not generalize well to domain-specific text (e.g., legal, scientific).
- Complex grammatical restructuring or rephrasing may be imperfect.
- English only (based on JFLEG data).
Contributing
Contributions and suggestions are welcome! Please open an issue or submit a pull request for improvements or extensions (e.g., multilingual support, UI integration).
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support