Spaces:
Sleeping
Sleeping
title: Multimodal Rag Hm | |
emoji: π | |
colorFrom: purple | |
colorTo: yellow | |
sdk: gradio | |
sdk_version: 5.34.2 | |
app_file: app.py | |
pinned: false | |
short_description: A simple Multimodal RAG on top of H&M fashion data | |
# π Fashion Multimodal RAG Assistant | |
This project implements a complete multimodal RAG (Retrieval-Augmented Generation) pipeline that can search through fashion items using both text and image queries, then generate helpful responses using an LLM. | |
## π Features | |
- **Multimodal Search**: Search for fashion items using either text descriptions or image uploads | |
- **Vector Similarity**: Powered by CLIP embeddings for high-quality similarity matching | |
- **AI-Generated Recommendations**: Get personalized fashion recommendations based on your search | |
- **Interactive Web Interface**: Easy-to-use Gradio interface for a seamless experience | |
## π How It Works | |
The pipeline consists of three main phases: | |
1. **Retrieval**: Finds similar fashion items using vector search with CLIP embeddings | |
2. **Augmentation**: Creates enhanced prompts with retrieved context from the fashion database | |
3. **Generation**: Generates helpful, creative responses using a fine-tuned LLM (Qwen2.5-0.5B-Instruct) | |
## π Dataset | |
The project uses the H&M Fashion Caption Dataset: | |
- 20K+ fashion items with images and text descriptions | |
- Source: [H&M Fashion Caption Dataset on HuggingFace](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption) | |
## π§ Technical Details | |
- **Vector Database**: LanceDB for efficient similarity search | |
- **Embedding Model**: CLIP for multimodal embeddings | |
- **LLM**: Qwen/Qwen2.5-0.5B-Instruct for response generation | |
- **Web Interface**: Gradio for interactive user experience | |
## π» Usage | |
You can interact with the application in two ways: | |
### Web Interface | |
The app comes with a Gradio web interface for easy interaction: | |
``` | |
python app.py --app | |
``` | |
### Command Line | |
You can also use the command line for specific queries: | |
``` | |
# Text query | |
python app.py --query "black dress for evening" | |
# Image query (if you have an image file) | |
python app.py --query "path/to/fashion/image.jpg" | |
``` | |
## π οΈ Installation | |
To run this project locally: | |
1. Clone the repository | |
2. Install dependencies: | |
``` | |
pip install -r requirements.txt | |
``` | |
3. Run the application: | |
``` | |
python app.py --app | |
``` | |
## π License | |
This project uses the H&M Fashion Caption Dataset which is publicly available on HuggingFace. | |
## π Acknowledgements | |
- H&M Fashion Dataset by [tomytjandra](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption) | |
- Built with LanceDB, CLIP, and Qwen LLM | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |