File size: 2,740 Bytes
0bcafcc
 
 
 
 
 
 
 
 
 
 
 
06a1f7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bcafcc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
title: Multimodal Rag Hm
emoji: πŸ‘€
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
short_description: A simple Multimodal RAG on top of H&M fashion data
---

# πŸ‘— Fashion Multimodal RAG Assistant

This project implements a complete multimodal RAG (Retrieval-Augmented Generation) pipeline that can search through fashion items using both text and image queries, then generate helpful responses using an LLM.

## πŸ” Features

- **Multimodal Search**: Search for fashion items using either text descriptions or image uploads
- **Vector Similarity**: Powered by CLIP embeddings for high-quality similarity matching
- **AI-Generated Recommendations**: Get personalized fashion recommendations based on your search
- **Interactive Web Interface**: Easy-to-use Gradio interface for a seamless experience

## πŸš€ How It Works

The pipeline consists of three main phases:

1. **Retrieval**: Finds similar fashion items using vector search with CLIP embeddings
2. **Augmentation**: Creates enhanced prompts with retrieved context from the fashion database
3. **Generation**: Generates helpful, creative responses using a fine-tuned LLM (Qwen2.5-0.5B-Instruct)

## πŸ“Š Dataset

The project uses the H&M Fashion Caption Dataset:
- 20K+ fashion items with images and text descriptions
- Source: [H&M Fashion Caption Dataset on HuggingFace](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)

## πŸ”§ Technical Details

- **Vector Database**: LanceDB for efficient similarity search
- **Embedding Model**: CLIP for multimodal embeddings
- **LLM**: Qwen/Qwen2.5-0.5B-Instruct for response generation
- **Web Interface**: Gradio for interactive user experience

## πŸ’» Usage

You can interact with the application in two ways:

### Web Interface
The app comes with a Gradio web interface for easy interaction:
```
python app.py --app
```

### Command Line
You can also use the command line for specific queries:
```
# Text query
python app.py --query "black dress for evening"

# Image query (if you have an image file)
python app.py --query "path/to/fashion/image.jpg"
```

## πŸ› οΈ Installation

To run this project locally:

1. Clone the repository
2. Install dependencies:
   ```
   pip install -r requirements.txt
   ```
3. Run the application:
   ```
   python app.py --app
   ```

## πŸ“ License

This project uses the H&M Fashion Caption Dataset which is publicly available on HuggingFace.

## πŸ™ Acknowledgements

- H&M Fashion Dataset by [tomytjandra](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
- Built with LanceDB, CLIP, and Qwen LLM

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference