Orel MAZOR commited on
Commit
3cce64e
Β·
1 Parent(s): 08f3a23
Files changed (3) hide show
  1. .DS_Store +0 -0
  2. .github/workflows/sync_to_hf_space.yml +20 -0
  3. README.md +177 -15
.DS_Store CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
 
.github/workflows/sync_to_hf_space.yml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Space
2
+ on:
3
+ push:
4
+ branches: [main]
5
+ jobs:
6
+ sync-to-hub:
7
+ runs-on: ubuntu-latest
8
+ steps:
9
+ - uses: actions/checkout@v3
10
+ with:
11
+ fetch-depth: 0
12
+ lfs: true
13
+ - name: Push to Hugging Face Space
14
+ env:
15
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
16
+ run: |
17
+ git config --global user.email "[email protected]"
18
+ git config --global user.name "Your Name"
19
+ git remote add space https://huggingface.co/spaces/Coool2/Final_Assignment_Template
20
+ git push -f https://Coool2:[email protected]/spaces/Coool2/Final_Assignment_Template main
README.md CHANGED
@@ -1,15 +1,177 @@
1
- ---
2
- title: Template Final Assignment
3
- emoji: πŸ•΅πŸ»β€β™‚οΈ
4
- colorFrom: indigo
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.25.2
8
- app_file: app.py
9
- pinned: false
10
- hf_oauth: true
11
- # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
- hf_oauth_expiration_minutes: 480
13
- ---
14
-
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ€– Advanced GAIA Agents Challenge Solution
2
+
3
+ A comprehensive solution for the [Hugging Face Agents Course Unit 4 GAIA Challenge](https://huggingface.co/learn/agents-course/unit4/hands-on), featuring advanced multimodal AI agents with dynamic RAG capabilities, quantized models for Kaggle compatibility, and both synchronous/asynchronous execution modes.
4
+
5
+ ## 🌟 Features
6
+
7
+ ### 🧠 Dual Agent Architecture
8
+ - **Agent 1 (LlamaIndex)**: Advanced multimodal agent with dynamic knowledge base and hybrid reranking
9
+ - **Agent 2 (Smolagents)**: Gemini-powered agent with BM25 retrieval and observability
10
+
11
+ ### Features for Agent 1
12
+ ### 🎯 Multimodal Capabilities
13
+ - **BAAI Visualized Embedding**: BGE-M3 based multimodal embeddings running on cuda:1
14
+ - **Pixtral 12B Quantized**: FP8/4-bit quantized vision-language model for resource-constrained environments
15
+ - **Hybrid Retrieval**: Text + visual content processing with ColPali and SentenceTransformer reranking
16
+
17
+ ### ⚑ Execution Modes
18
+ - **Asynchronous Mode**: Concurrent question processing for maximum speed
19
+ - **Kaggle Compatibility**: Optimized for resource-constrained environments
20
+
21
+ ### πŸ” Advanced RAG System
22
+ - **Dynamic Knowledge Base**: Automatically updated with web search results
23
+ - **Multimodal Parsing**: Handles text, images, PDFs, audio, and video files
24
+ - **Smart Reranking**: Hybrid approach combining text and visual rerankers
25
+
26
+ ## πŸ—οΈ Architecture
27
+
28
+ ```
29
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
30
+ β”‚ APP β”‚
31
+ β”‚ (Async/Sync Modes) β”‚
32
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
33
+ β”‚
34
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
35
+ β”‚ β”‚
36
+ β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
37
+ β”‚Agent 1 β”‚ β”‚Agent 2 β”‚
38
+ β”‚LlamaIdx β”‚ β”‚Smolagentβ”‚
39
+ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
40
+ β”‚ β”‚
41
+ β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
42
+ β”‚Dynamic β”‚ β”‚BM25 + β”‚
43
+ β”‚RAG + β”‚ β”‚Langfuse β”‚
44
+ β”‚Hybrid β”‚ β”‚Observ. β”‚
45
+ β”‚Rerank β”‚ β”‚ β”‚
46
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
47
+ ```
48
+
49
+ ## πŸš€ Quick Start
50
+
51
+ ### Prerequisites
52
+
53
+ ### Installation
54
+
55
+ 1. **Clone the repository**:
56
+ ```bash
57
+ git clone https://github.com/yourusername/gaia-agents-challenge
58
+ cd gaia-agents-challenge
59
+ ```
60
+
61
+ 2. **Install FlagEmbedding with visual support**:
62
+ ```bash
63
+ git clone https://github.com/FlagOpen/FlagEmbedding.git
64
+ cd FlagEmbedding/research/visual_bge
65
+ pip install -e .
66
+ cd ../../..
67
+ ```
68
+
69
+ 3. **Install additional dependencies**:
70
+ #### For Agent 1:
71
+ ```bash
72
+ pip install -r requirements.txt
73
+ ```
74
+ #### For Agent 2:
75
+ ```bash
76
+ pip install -r requirements2.txt
77
+ ```
78
+
79
+
80
+ 4. **Set environment variables**:
81
+ ```bash
82
+ export GOOGLE_API_KEY="your_gemini_api_key"
83
+ export HUGGINGFACEHUB_API_TOKEN="your_hf_token"
84
+ export LANGFUSE_PUBLIC_KEY="your_langfuse_public_key" # Optional
85
+ export LANGFUSE_SECRET_KEY="your_langfuse_secret_key" # Optional
86
+ ```
87
+
88
+ ### Usage
89
+
90
+ ```bash
91
+ # LlamaIndex Agent
92
+ python agent.py
93
+
94
+ # Smolagents Agent
95
+ python agent2.py
96
+ ```
97
+
98
+ ## πŸ“ Project Structure
99
+
100
+ ```
101
+ β”œβ”€β”€ agent.py # LlamaIndex-based agent with dynamic RAG
102
+ β”œβ”€β”€ agent2.py # Smolagents-based agent with observability
103
+ β”œβ”€β”€ appasync.py # Original async Gradio interface
104
+ β”œβ”€β”€ app.py # Original sync Gradio interface
105
+ β”œβ”€β”€ custom_models.py # Custom model implementations
106
+ β”œβ”€β”€ requirements.txt # Python dependencies
107
+ β”œβ”€β”€ README.md # This file
108
+ ```
109
+
110
+ ## πŸ§ͺ Testing
111
+
112
+ ### Run Individual Components
113
+ ```bash
114
+ # Test BAAI embedding
115
+ python -c "from custom_models import BaaiMultimodalEmbedding; print('BAAI OK')"
116
+
117
+ # Test Pixtral quantized
118
+ python -c "from custom_models import PixtralQuantizedLLM; print('Pixtral OK')"
119
+
120
+ # Test agents
121
+ python agent.py
122
+ python agent2.py
123
+ ```
124
+
125
+ ### Run GAIA Evaluation
126
+ ```bash
127
+ # Through the web interface
128
+ python app.py
129
+
130
+ # Or programmatically
131
+ python -c "
132
+ from agent2 import GAIAAgent
133
+ agent = GAIAAgent()
134
+ result = agent.solve_gaia_question({'Question': 'Test question', 'task_id': 'test'})
135
+ print(result)
136
+ "
137
+ ```
138
+
139
+ ## οΏ½οΏ½οΏ½ Customization
140
+
141
+ ### Adding New Models
142
+ 1. Create a new class in `custom_models.py`
143
+ 2. Implement the required interfaces
144
+ 3. Update the agent configuration
145
+
146
+ ### Modifying RAG Behavior
147
+ - Edit `DynamicQueryEngineManager` in `agent.py`
148
+ - Adjust reranking strategies in `HybridReranker`
149
+ - Configure search parameters in `enhanced_web_search_tool`
150
+
151
+ ### UI Customization
152
+ - Modify `app_unified.py` for interface changes
153
+ - Add new execution modes
154
+ - Integrate additional observability tools
155
+
156
+ ## πŸ› Troubleshooting
157
+
158
+ ### Common Issues
159
+
160
+ #### Model Loading Failures
161
+ - Check internet connectivity for model downloads
162
+ - Verify HuggingFace token permissions
163
+ - Clear model cache: `rm -rf ~/.cache/huggingface/`
164
+
165
+ #### Visual BGE Import Errors
166
+ ```bash
167
+ # Ensure proper installation
168
+ cd FlagEmbedding/research/visual_bge
169
+ pip install -e .
170
+ ```
171
+
172
+ ## πŸ”— References
173
+
174
+ - [GAIA Benchmark](https://huggingface.co/datasets/gaia-benchmark/GAIA)
175
+ - [LlamaIndex](https://github.com/run-llama/llama_index)
176
+ - [BGE Models](https://github.com/FlagOpen/FlagEmbedding)
177
+ - [Gradio](https://github.com/gradio-app/gradio)