bilal521
/

flan-t5-youtube-summarizer

Model card Files Files and versions Community

flan-t5-youtube-summarizer / README.md

bilal521's picture

Update README.md

e7860c1 verified 3 days ago

|

history blame contribute delete

2.5 kB

	---
	language: en
	license: apache-2.0
	datasets:
	- custom
	tags:
	- summarization
	- flan-t5
	- youtube
	- fine-tuned
	base_model: google/flan-t5-base
	model-index:
	- name: Flan T5 YouTube Summarizer
	results: []
	metrics:
	- rouge
	pipeline_tag: summarization
	---

	# 📺 T5 YouTube Summarizer

	This is a fine-tuned [flan-t5-base](https://huggingface.co/google/flan-t5-base) model for abstractive summarization of YouTube video transcripts. The model is trained on a custom dataset of video transcriptions and their manually written summaries.

	---

	## ✨ Model Details

	- Base Model: [flan-t5-base](https://huggingface.co/google/flan-t5-base)
	- Task: Abstractive Summarization
	- Training Data: YouTube video transcripts and human-written summaries
	- Max Input Length: 512 tokens
	- Max Output Length: 256 tokens
	- Fine-tuning Epochs: 5
	- Tokenizer: T5Tokenizer (pretrained)

	---

	## 🧠 Intended Use

	This model is designed to generate short, informative summaries from long transcripts of educational or conceptual YouTube videos. It can be used for:

	- Quick understanding of long videos
	- Automated content summaries for blogs, platforms, or note-taking tools
	- Enhancing accessibility for long-form spoken content

	---

	## 🚀 How to Use

	```python
	from transformers import T5ForConditionalGeneration, T5Tokenizer

	# Load the model
	model = T5ForConditionalGeneration.from_pretrained("bilal521/flan-t5-youtube-summarizer")
	tokenizer = T5Tokenizer.from_pretrained("bilal521/flan-t5-youtube-summarizer")

	# Define input text
	text = "The video talks about coordinate covalent bonds, giving examples from..."

	# Preprocess and summarize
	inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)

	summary_ids = model.generate(
	inputs,
	max_length=256,
	min_length=80,
	num_beams=5,
	length_penalty=2.0,
	no_repeat_ngram_size=3,
	early_stopping=True
	)

	summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
	print(summary)
	```

	## 📊 Evaluation

	\| Metric \| Value \|
	\| ------- \| ------------ \|
	\| ROUGE-1 \| \~0.61 \|
	\| ROUGE-2 \| \~0.27 \|
	\| ROUGE-L \| \~0.48 \|
	\| Gen Len \| \~187 tokens \|


	## 📌 Citation
	If you use this model in your work, consider citing:
	```
	@misc{t5ytsummarizer2025,
	title={Flan T5 YouTube Transcript Summarizer},
	author={Muhammad Bilal Yousaf},
	year={2025},
	howpublished={\url{https://huggingface.co/bilal521/flan-t5-youtube-summarizer}},
	}
	```