Spaces:
Sleeping
Sleeping
File size: 2,740 Bytes
0bcafcc 06a1f7d 0bcafcc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
title: Multimodal Rag Hm
emoji: π
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
short_description: A simple Multimodal RAG on top of H&M fashion data
---
# π Fashion Multimodal RAG Assistant
This project implements a complete multimodal RAG (Retrieval-Augmented Generation) pipeline that can search through fashion items using both text and image queries, then generate helpful responses using an LLM.
## π Features
- **Multimodal Search**: Search for fashion items using either text descriptions or image uploads
- **Vector Similarity**: Powered by CLIP embeddings for high-quality similarity matching
- **AI-Generated Recommendations**: Get personalized fashion recommendations based on your search
- **Interactive Web Interface**: Easy-to-use Gradio interface for a seamless experience
## π How It Works
The pipeline consists of three main phases:
1. **Retrieval**: Finds similar fashion items using vector search with CLIP embeddings
2. **Augmentation**: Creates enhanced prompts with retrieved context from the fashion database
3. **Generation**: Generates helpful, creative responses using a fine-tuned LLM (Qwen2.5-0.5B-Instruct)
## π Dataset
The project uses the H&M Fashion Caption Dataset:
- 20K+ fashion items with images and text descriptions
- Source: [H&M Fashion Caption Dataset on HuggingFace](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
## π§ Technical Details
- **Vector Database**: LanceDB for efficient similarity search
- **Embedding Model**: CLIP for multimodal embeddings
- **LLM**: Qwen/Qwen2.5-0.5B-Instruct for response generation
- **Web Interface**: Gradio for interactive user experience
## π» Usage
You can interact with the application in two ways:
### Web Interface
The app comes with a Gradio web interface for easy interaction:
```
python app.py --app
```
### Command Line
You can also use the command line for specific queries:
```
# Text query
python app.py --query "black dress for evening"
# Image query (if you have an image file)
python app.py --query "path/to/fashion/image.jpg"
```
## π οΈ Installation
To run this project locally:
1. Clone the repository
2. Install dependencies:
```
pip install -r requirements.txt
```
3. Run the application:
```
python app.py --app
```
## π License
This project uses the H&M Fashion Caption Dataset which is publicly available on HuggingFace.
## π Acknowledgements
- H&M Fashion Dataset by [tomytjandra](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
- Built with LanceDB, CLIP, and Qwen LLM
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|