Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -8,5 +8,92 @@ sdk_version: 1.38.0
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
+
# Qwen2-Colpali-OCR
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
This application demonstrates a Multimodal Retrieval-Augmented Generation (RAG) system using the Qwen2-VL model and a custom RAG implementation. It allows users to upload images and ask questions about them, combining visual and textual information to generate responses.
|
| 15 |
+
It is deployed here on HuggingFace Spaces [https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr]([url](https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr))
|
| 16 |
+
|
| 17 |
+
## Prerequisites
|
| 18 |
+
|
| 19 |
+
- Python 3.8+
|
| 20 |
+
- CUDA-compatible GPU (recommended for optimal performance)
|
| 21 |
+
|
| 22 |
+
## Installation
|
| 23 |
+
|
| 24 |
+
1. Clone the repository:
|
| 25 |
+
```
|
| 26 |
+
git clone https://github.com/your-username/multimodal-rag-app.git
|
| 27 |
+
cd multimodal-rag-app
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
2. Create a virtual environment:
|
| 31 |
+
```
|
| 32 |
+
python -m venv venv
|
| 33 |
+
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
3. Install the required packages:
|
| 37 |
+
```
|
| 38 |
+
pip install -r requirements.txt
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Running the Application Locally
|
| 42 |
+
|
| 43 |
+
1. Ensure you're in the project directory and your virtual environment is activated.
|
| 44 |
+
|
| 45 |
+
2. Run the Streamlit app:
|
| 46 |
+
```
|
| 47 |
+
streamlit run app.py
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
3. Open a web browser and navigate to the URL provided by Streamlit (usually `http://localhost:8501`).
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
## Features
|
| 54 |
+
|
| 55 |
+
- Image upload or selection of an example image
|
| 56 |
+
- Text-based querying of uploaded images
|
| 57 |
+
- Multimodal RAG processing using custom RAG model and Qwen2-VL
|
| 58 |
+
- Adjustable response length
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
## Usage
|
| 62 |
+
|
| 63 |
+
1. Choose to upload an image or use the example image.
|
| 64 |
+
2. If uploading, select an image file (PNG, JPG, or JPEG).
|
| 65 |
+
3. Enter a text query about the image in the provided input field.
|
| 66 |
+
4. Adjust the maximum number of tokens for the response using the slider.
|
| 67 |
+
5. View the generated response based on the image and your query.
|
| 68 |
+
|
| 69 |
+
## Deployment
|
| 70 |
+
|
| 71 |
+
This application can be deployed on various platforms that support Streamlit apps. Here are general steps for deployment:
|
| 72 |
+
|
| 73 |
+
1. Ensure all dependencies are listed in `requirements.txt`.
|
| 74 |
+
2. Choose a deployment platform (e.g., Streamlit Cloud, Heroku, or a cloud provider like AWS or GCP).
|
| 75 |
+
3. Follow the platform-specific deployment instructions, which typically involve:
|
| 76 |
+
- Connecting your GitHub repository to the deployment platform
|
| 77 |
+
- Configuring environment variables if necessary
|
| 78 |
+
- Setting up any required build processes
|
| 79 |
+
|
| 80 |
+
Note: For optimal performance, deploy on a platform that provides GPU support.
|
| 81 |
+
|
| 82 |
+
## Disclaimer
|
| 83 |
+
|
| 84 |
+
The apputilizes the free tier of HuggingFace Spaces, which only has support for CPU, resulting in slower processing times. For optimal performance, it's recommended to run the app locally on a machine with GPU support.
|
| 85 |
+
|
| 86 |
+
## Contributing
|
| 87 |
+
|
| 88 |
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
GNU Public License v2
|
| 93 |
+
|
| 94 |
+
## Acknowledgments
|
| 95 |
+
|
| 96 |
+
- This project uses the [Qwen2-VL model](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) from Hugging Face.
|
| 97 |
+
- The custom RAG implementation is based on the [colpali model](https://huggingface.co/vidore/colpali).
|
| 98 |
+
|
| 99 |
|
|
|