Spaces:

dceshubh
/

multimodal_rag_Hm

Sleeping

App Files Files Community

multimodal_rag_Hm / README.md

dceshubh

Add README and requirements.txt

06a1f7d 3 months ago

preview code

raw

history blame contribute delete

2.74 kB

	---
	title: Multimodal Rag Hm
	emoji: 👀
	colorFrom: purple
	colorTo: yellow
	sdk: gradio
	sdk_version: 5.34.2
	app_file: app.py
	pinned: false
	short_description: A simple Multimodal RAG on top of H&M fashion data
	---

	# 👗 Fashion Multimodal RAG Assistant

	This project implements a complete multimodal RAG (Retrieval-Augmented Generation) pipeline that can search through fashion items using both text and image queries, then generate helpful responses using an LLM.

	## 🔍 Features

	- Multimodal Search: Search for fashion items using either text descriptions or image uploads
	- Vector Similarity: Powered by CLIP embeddings for high-quality similarity matching
	- AI-Generated Recommendations: Get personalized fashion recommendations based on your search
	- Interactive Web Interface: Easy-to-use Gradio interface for a seamless experience

	## 🚀 How It Works

	The pipeline consists of three main phases:

	1. Retrieval: Finds similar fashion items using vector search with CLIP embeddings
	2. Augmentation: Creates enhanced prompts with retrieved context from the fashion database
	3. Generation: Generates helpful, creative responses using a fine-tuned LLM (Qwen2.5-0.5B-Instruct)

	## 📊 Dataset

	The project uses the H&M Fashion Caption Dataset:
	- 20K+ fashion items with images and text descriptions
	- Source: [H&M Fashion Caption Dataset on HuggingFace](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)

	## 🔧 Technical Details

	- Vector Database: LanceDB for efficient similarity search
	- Embedding Model: CLIP for multimodal embeddings
	- LLM: Qwen/Qwen2.5-0.5B-Instruct for response generation
	- Web Interface: Gradio for interactive user experience

	## 💻 Usage

	You can interact with the application in two ways:

	### Web Interface
	The app comes with a Gradio web interface for easy interaction:
	```
	python app.py --app
	```

	### Command Line
	You can also use the command line for specific queries:
	```
	# Text query
	python app.py --query "black dress for evening"

	# Image query (if you have an image file)
	python app.py --query "path/to/fashion/image.jpg"
	```

	## 🛠️ Installation

	To run this project locally:

	1. Clone the repository
	2. Install dependencies:
	```
	pip install -r requirements.txt
	```
	3. Run the application:
	```
	python app.py --app
	```

	## 📝 License

	This project uses the H&M Fashion Caption Dataset which is publicly available on HuggingFace.

	## 🙏 Acknowledgements

	- H&M Fashion Dataset by [tomytjandra](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
	- Built with LanceDB, CLIP, and Qwen LLM

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference