AventIQ-AI
/

Text-Summarization-for-Product-Descriptions

Safetensors

Model card Files Files and versions

xet

Community

vishal1364 commited on May 19

Commit

1381a59

verified ·

1 Parent(s): bf4a293

Create README.md

Browse files

Files changed (1) hide show

README.md +113 -0

README.md ADDED Viewed

	@@ -0,0 +1,113 @@

+# 🧠 Text Summarization for Product Descriptions
+A **T5-small-based** abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation.
+---
+## ✨ Model Highlights
+- 📌 Based on [`t5-small`](https://huggingface.co/t5-small)
+- 🧪 Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries
+- ⚡ Supports **abstractive summarization** of English product texts
+- 🧠 Built using **Hugging Face Transformers** and **PyTorch**
+---
+## 🧠 Intended Uses
+- ✅ Auto-generating product summaries for catalogs or online listings
+- ✅ Shortening verbose product descriptions for UI-friendly displays
+- ✅ Content creation support for e-commerce and marketing
+---
+## 🚫 Limitations
+- ❌ English-only (not trained for multilingual input)
+- 🧠 Cannot fact-check or verify real-world product details
+- 🧪 Trained on synthetic data — real-world generalization may be limited
+- ⚠️ May generate generic or repetitive summaries for complex inputs
+---
+## 🏋️‍♂️ Training Details
+| Attribute          | Value                                         |
+|-------------------|-----------------------------------------------|
+| Base Model         | `t5-small`                                   |
+| Dataset            | Custom synthetic CSV of product summaries    |
+| Input Field        | `product_description`                        |
+| Target Field       | `summary`                                    |
+| Max Token Length   | 512 input / 64 summary                        |
+| Epochs             | 3                                             |
+| Batch Size         | 4                                             |
+| Optimizer          | AdamW                                         |
+| Loss Function      | CrossEntropyLoss (via `Trainer`)             |
+| Framework          | PyTorch + Transformers                       |
+| Hardware           | CUDA-enabled GPU                             |
+---
+## 📊 Evaluation Metrics
+| Metric    | Score (Synthetic Eval) |
+|-----------|------------------------|
+| ROUGE-1   | 24.49                   |
+| ROUGE-2   | 22.10                   |
+| ROUGE-L   | 24.47                   |
+| ROUGE-lsum| 24.46                   |
+---
+## 🚀 Usage
+```python
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+import torch
+model_name = "your-username/Text-Summarization-for-Product-Descriptions"
+tokenizer = T5Tokenizer.from_pretrained(model_name)
+model = T5ForConditionalGeneration.from_pretrained(model_name)
+model.eval()
+def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64):
+    model.eval()
+    device = next(model.parameters()).device  # get device (cpu or cuda)
+    input_text = "summarize: " + text.strip()
+    inputs = tokenizer(
+        input_text,
+        return_tensors="pt",
+        truncation=True,
+        padding="max_length",
+        max_length=max_input_length
+    ).to(device)  # move inputs to device
+    with torch.no_grad():
+        summary_ids = model.generate(
+            input_ids=inputs["input_ids"],
+            attention_mask=inputs["attention_mask"],
+            max_length=max_output_length,
+            num_beams=4,
+            early_stopping=True
+        )
+    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
+    return summary
+# Example
+text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base."
+print("Summary:", summarize(text))
+```
+## 📁 Repository Structure
+```
+.
+├── model/                    # Fine-tuned model files (pytorch_model.bin, config.json)
+├── tokenizer/                # Tokenizer config and vocab
+├── training_script.py        # Training code
+├── product_descriptions.csv # Source dataset
+├── utils.py                  # Preprocessing & summarization utilities
+├── README.md                 # Model card
+```
+## 🤝 Contributing
+Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates.