|
--- |
|
language: en |
|
license: apache-2.0 |
|
datasets: |
|
- custom |
|
tags: |
|
- summarization |
|
- flan-t5 |
|
- youtube |
|
- fine-tuned |
|
base_model: google/flan-t5-base |
|
model-index: |
|
- name: Flan T5 YouTube Summarizer |
|
results: [] |
|
metrics: |
|
- rouge |
|
pipeline_tag: summarization |
|
--- |
|
|
|
# πΊ T5 YouTube Summarizer |
|
|
|
This is a fine-tuned [flan-t5-base](https://huggingface.co/google/flan-t5-base) model for abstractive summarization of YouTube video transcripts. The model is trained on a custom dataset of video transcriptions and their manually written summaries. |
|
|
|
--- |
|
|
|
## β¨ Model Details |
|
|
|
- **Base Model**: [flan-t5-base](https://huggingface.co/google/flan-t5-base) |
|
- **Task**: Abstractive Summarization |
|
- **Training Data**: YouTube video transcripts and human-written summaries |
|
- **Max Input Length**: 512 tokens |
|
- **Max Output Length**: 256 tokens |
|
- **Fine-tuning Epochs**: 5 |
|
- **Tokenizer**: T5Tokenizer (pretrained) |
|
|
|
--- |
|
|
|
## π§ Intended Use |
|
|
|
This model is designed to generate short, informative summaries from long transcripts of educational or conceptual YouTube videos. It can be used for: |
|
|
|
- Quick understanding of long videos |
|
- Automated content summaries for blogs, platforms, or note-taking tools |
|
- Enhancing accessibility for long-form spoken content |
|
|
|
--- |
|
|
|
## π How to Use |
|
|
|
```python |
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
|
# Load the model |
|
model = T5ForConditionalGeneration.from_pretrained("bilal521/flan-t5-youtube-summarizer") |
|
tokenizer = T5Tokenizer.from_pretrained("bilal521/flan-t5-youtube-summarizer") |
|
|
|
# Define input text |
|
text = "The video talks about coordinate covalent bonds, giving examples from..." |
|
|
|
# Preprocess and summarize |
|
inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True) |
|
|
|
summary_ids = model.generate( |
|
inputs, |
|
max_length=256, |
|
min_length=80, |
|
num_beams=5, |
|
length_penalty=2.0, |
|
no_repeat_ngram_size=3, |
|
early_stopping=True |
|
) |
|
|
|
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
print(summary) |
|
``` |
|
|
|
## π Evaluation |
|
|
|
| Metric | Value | |
|
| ------- | ------------ | |
|
| ROUGE-1 | \~0.61 | |
|
| ROUGE-2 | \~0.27 | |
|
| ROUGE-L | \~0.48 | |
|
| Gen Len | \~187 tokens | |
|
|
|
|
|
## π Citation |
|
If you use this model in your work, consider citing: |
|
``` |
|
@misc{t5ytsummarizer2025, |
|
title={Flan T5 YouTube Transcript Summarizer}, |
|
author={Muhammad Bilal Yousaf}, |
|
year={2025}, |
|
howpublished={\url{https://huggingface.co/bilal521/flan-t5-youtube-summarizer}}, |
|
} |
|
``` |