bilal521 commited on
Commit
240a72a
·
verified ·
1 Parent(s): 9fb3678

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ datasets:
5
+ - custom
6
+ tags:
7
+ - summarization
8
+ - flan-t5
9
+ - youtube
10
+ - fine-tuned
11
+ base_model: google/flan-t5-base
12
+ model-index:
13
+ - name: Flan T5 YouTube Summarizer
14
+ results: []
15
+ ---
16
+
17
+ # 📺 T5 YouTube Summarizer
18
+
19
+ This is a fine-tuned [flan-t5-base](https://huggingface.co/google/flan-t5-base) model for abstractive summarization of YouTube video transcripts. The model is trained on a custom dataset of video transcriptions and their manually written summaries.
20
+
21
+ ---
22
+
23
+ ## ✨ Model Details
24
+
25
+ - **Base Model**: [flan-t5-base](https://huggingface.co/google/flan-t5-base)
26
+ - **Task**: Abstractive Summarization
27
+ - **Training Data**: YouTube video transcripts and human-written summaries
28
+ - **Max Input Length**: 512 tokens
29
+ - **Max Output Length**: 256 tokens
30
+ - **Fine-tuning Epochs**: 10
31
+ - **Tokenizer**: T5Tokenizer (pretrained)
32
+
33
+ ---
34
+
35
+ ## 🧠 Intended Use
36
+
37
+ This model is designed to generate short, informative summaries from long transcripts of educational or conceptual YouTube videos. It can be used for:
38
+
39
+ - Quick understanding of long videos
40
+ - Automated content summaries for blogs, platforms, or note-taking tools
41
+ - Enhancing accessibility for long-form spoken content
42
+
43
+ ---
44
+
45
+ ## 🚀 How to Use
46
+
47
+ python
48
+ from transformers import T5ForConditionalGeneration, T5Tokenizer
49
+
50
+ # Load the model
51
+ model = T5ForConditionalGeneration.from_pretrained("bilal521/flan-t5-youtube-summarizer")
52
+ tokenizer = T5Tokenizer.from_pretrained("bilal521/flan-t5-youtube-summarizer")
53
+
54
+ # Define input text
55
+ text = "The video talks about coordinate covalent bonds, giving examples from..."
56
+
57
+ # Preprocess and summarize
58
+ inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
59
+
60
+ summary_ids = model.generate(
61
+ inputs,
62
+ max_length=256,
63
+ min_length=80,
64
+ num_beams=5,
65
+ length_penalty=2.0,
66
+ no_repeat_ngram_size=3,
67
+ early_stopping=True
68
+ )
69
+
70
+ summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
71
+ print(summary)
72
+
73
+
74
+ ## 📊 Evaluation
75
+
76
+ | Metric | Value |
77
+ | ------- | ------------ |
78
+ | ROUGE-1 | \~0.61 |
79
+ | ROUGE-2 | \~0.27 |
80
+ | ROUGE-L | \~0.48 |
81
+ | Gen Len | \~187 tokens |
82
+
83
+
84
+ ## 📌 Citation
85
+ If you use this model in your work, consider citing:
86
+ @misc{t5ytsummarizer2025,
87
+ title={Flan T5 YouTube Transcript Summarizer},
88
+ author={Muhammad Bilal Yousaf},
89
+ year={2025},
90
+ howpublished={\url{https://huggingface.co/bilal521/flan-t5-youtube-summarizer}},
91
+ }