vishal1364 commited on
Commit
1381a59
Β·
verified Β·
1 Parent(s): bf4a293

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -0
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 Text Summarization for Product Descriptions
2
+
3
+ A **T5-small-based** abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation.
4
+
5
+ ---
6
+
7
+ ## ✨ Model Highlights
8
+
9
+ - πŸ“Œ Based on [`t5-small`](https://huggingface.co/t5-small)
10
+ - πŸ§ͺ Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries
11
+ - ⚑ Supports **abstractive summarization** of English product texts
12
+ - 🧠 Built using **Hugging Face Transformers** and **PyTorch**
13
+
14
+ ---
15
+
16
+ ## 🧠 Intended Uses
17
+
18
+ - βœ… Auto-generating product summaries for catalogs or online listings
19
+ - βœ… Shortening verbose product descriptions for UI-friendly displays
20
+ - βœ… Content creation support for e-commerce and marketing
21
+
22
+ ---
23
+
24
+ ## 🚫 Limitations
25
+
26
+ - ❌ English-only (not trained for multilingual input)
27
+ - 🧠 Cannot fact-check or verify real-world product details
28
+ - πŸ§ͺ Trained on synthetic data β€” real-world generalization may be limited
29
+ - ⚠️ May generate generic or repetitive summaries for complex inputs
30
+
31
+ ---
32
+
33
+ ## πŸ‹οΈβ€β™‚οΈ Training Details
34
+
35
+ | Attribute | Value |
36
+ |-------------------|-----------------------------------------------|
37
+ | Base Model | `t5-small` |
38
+ | Dataset | Custom synthetic CSV of product summaries |
39
+ | Input Field | `product_description` |
40
+ | Target Field | `summary` |
41
+ | Max Token Length | 512 input / 64 summary |
42
+ | Epochs | 3 |
43
+ | Batch Size | 4 |
44
+ | Optimizer | AdamW |
45
+ | Loss Function | CrossEntropyLoss (via `Trainer`) |
46
+ | Framework | PyTorch + Transformers |
47
+ | Hardware | CUDA-enabled GPU |
48
+
49
+ ---
50
+
51
+ ## πŸ“Š Evaluation Metrics
52
+
53
+ | Metric | Score (Synthetic Eval) |
54
+ |-----------|------------------------|
55
+ | ROUGE-1 | 24.49 |
56
+ | ROUGE-2 | 22.10 |
57
+ | ROUGE-L | 24.47 |
58
+ | ROUGE-lsum| 24.46 |
59
+
60
+ ---
61
+
62
+ ## πŸš€ Usage
63
+
64
+ ```python
65
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
66
+ import torch
67
+
68
+ model_name = "your-username/Text-Summarization-for-Product-Descriptions"
69
+ tokenizer = T5Tokenizer.from_pretrained(model_name)
70
+ model = T5ForConditionalGeneration.from_pretrained(model_name)
71
+ model.eval()
72
+
73
+ def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64):
74
+ model.eval()
75
+ device = next(model.parameters()).device # get device (cpu or cuda)
76
+ input_text = "summarize: " + text.strip()
77
+ inputs = tokenizer(
78
+ input_text,
79
+ return_tensors="pt",
80
+ truncation=True,
81
+ padding="max_length",
82
+ max_length=max_input_length
83
+ ).to(device) # move inputs to device
84
+
85
+ with torch.no_grad():
86
+ summary_ids = model.generate(
87
+ input_ids=inputs["input_ids"],
88
+ attention_mask=inputs["attention_mask"],
89
+ max_length=max_output_length,
90
+ num_beams=4,
91
+ early_stopping=True
92
+ )
93
+
94
+ summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
95
+ return summary
96
+
97
+
98
+ # Example
99
+ text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base."
100
+ print("Summary:", summarize(text))
101
+ ```
102
+ ## πŸ“ Repository Structure
103
+ ```
104
+ .
105
+ β”œβ”€β”€ model/ # Fine-tuned model files (pytorch_model.bin, config.json)
106
+ β”œβ”€β”€ tokenizer/ # Tokenizer config and vocab
107
+ β”œβ”€β”€ training_script.py # Training code
108
+ β”œβ”€β”€ product_descriptions.csv # Source dataset
109
+ β”œβ”€β”€ utils.py # Preprocessing & summarization utilities
110
+ β”œβ”€β”€ README.md # Model card
111
+ ```
112
+ ## 🀝 Contributing
113
+ Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates.