gaia-agent / README.md
Luong Huu Thanh
Update README.md
f063c68
---
title: GAIA Agent - Q&A Chatbot
emoji: ๐Ÿค–
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
hf_oauth_expiration_minutes: 480
---
# ๐Ÿค– **GAIA Agent - Advanced Q&A Chatbot**
## ๐ŸŒŸ **Introduction**
**GAIA Agent** is a sophisticated AI-powered chatbot system designed to handle complex questions and tasks through an intuitive Q&A interface. Built on top of the GAIA benchmark framework, this agent combines advanced reasoning, code execution, web search, document processing, and multimodal understanding capabilities. The system features both a user-friendly chatbot interface and a comprehensive evaluation runner for benchmark testing.
## ๐Ÿš€ **Key Features**
- **๐Ÿ” Multi-Modal Search**: Web search, Wikipedia, and arXiv paper search
- **๐Ÿ’ป Code Execution**: Support for Python, Bash, SQL, C, and Java
- **๐Ÿ–ผ๏ธ Image Processing**: Analysis, transformation, OCR, and generation
- **๐Ÿ“„ Document Processing**: PDF, CSV, Excel, and text file analysis
- **๐Ÿ“ File Upload Support**: Handle multiple file types with drag-and-drop
- **๐Ÿงฎ Mathematical Operations**: Complete set of mathematical tools
- **๐Ÿ’ฌ Conversational Interface**: Natural chat-based interaction
- **๐Ÿ“Š Evaluation System**: Automated benchmark testing and submission
## ๐Ÿ—๏ธ **Project Structure**
```
gaia-agent/
โ”œโ”€โ”€ app.py # Main Q&A chatbot interface
โ”œโ”€โ”€ evaluation_app.py # GAIA benchmark evaluation runner
โ”œโ”€โ”€ agent.py # Core agent implementation with tools
โ”œโ”€โ”€ code_interpreter.py # Multi-language code execution
โ”œโ”€โ”€ image_processing.py # Image processing utilities
โ”œโ”€โ”€ system_prompt.txt # System prompt for the agent
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”œโ”€โ”€ metadata.jsonl # GAIA benchmark metadata
โ”œโ”€โ”€ explore_metadata.ipynb # Data exploration notebook
โ””โ”€โ”€ README.md # This file
```
## ๐Ÿ› ๏ธ **Tool Categories**
### **๐ŸŒ Browser & Search Tools**
- **Wikipedia Search**: Search Wikipedia with up to 2 results
- **Web Search**: Tavily-powered web search with up to 3 results
- **arXiv Search**: Academic paper search with up to 3 results
### **๐Ÿ’ป Code Interpreter Tools**
- **Multi-Language Execution**: Python, Bash, SQL, C, Java support
- **Plot Generation**: Matplotlib visualization support
- **DataFrame Analysis**: Pandas data processing
- **Error Handling**: Comprehensive error reporting
### **๐Ÿงฎ Mathematical Tools**
- **Basic Operations**: Add, subtract, multiply, divide
- **Advanced Functions**: Modulus, power, square root
- **Complex Numbers**: Support for complex number operations
### **๐Ÿ“„ Document Processing Tools**
- **File Operations**: Save, read, and download files
- **CSV Analysis**: Pandas-based data analysis
- **Excel Processing**: Excel file analysis and processing
- **OCR**: Extract text from images using Tesseract
### **๐Ÿ–ผ๏ธ Image Processing & Generation Tools**
- **Image Analysis**: Size, color, and property analysis
- **Transformations**: Resize, rotate, crop, flip, adjust brightness/contrast
- **Drawing Tools**: Add shapes, text, and annotations
- **Image Generation**: Create gradients, noise patterns, and simple graphics
- **Image Combination**: Stack and combine multiple images
## ๐ŸŽฏ **How to Use**
### **Q&A Chatbot Interface (app.py)**
1. **Start the Chatbot:**
```bash
python app.py
```
2. **Access the Interface:**
- Open `http://localhost:7860` in your browser
- Upload files (images, documents, CSV, etc.) if needed
- Ask questions in natural language
- Get comprehensive answers with tool usage
3. **Supported Interactions:**
- **Text Questions**: "What is the capital of France?"
- **Math Problems**: "Calculate the square root of 144"
- **Code Requests**: "Write a Python function to sort a list"
- **Image Analysis**: Upload an image and ask "What do you see?"
- **Data Analysis**: Upload a CSV and ask "What are the trends?"
- **Web Search**: "What are the latest AI developments?"
### **Evaluation Runner (evaluation_app.py)**
1. **Run the Evaluation:**
```bash
python evaluation_app.py
```
2. **Benchmark Testing:**
- Log in with your Hugging Face account
- Click "Run Evaluation & Submit All Answers"
- Monitor progress as the agent processes GAIA benchmark questions
- View results and scores automatically
## ๐Ÿ”ง **Technical Architecture**
### **LangGraph State Machine**
```
START โ†’ Retriever โ†’ Assistant โ†’ Tools โ†’ Assistant
โ†‘ โ†“
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```
1. **Retriever Node**: Searches vector database for similar questions
2. **Assistant Node**: LLM processes question with available tools
3. **Tools Node**: Executes selected tools (web search, code, etc.)
4. **Conditional Routing**: Dynamically routes between assistant and tools
### **Vector Database Integration**
- **Supabase Vector Store**: Stores GAIA benchmark Q&A pairs
- **Semantic Search**: Finds similar questions for context
- **HuggingFace Embeddings**: sentence-transformers/all-mpnet-base-v2
### **Multi-Modal File Support**
- **Images**: JPG, PNG, GIF, BMP, WebP
- **Documents**: PDF, DOC, DOCX, TXT, MD
- **Data**: CSV, Excel, JSON
- **Code**: Python, Bash, SQL, C, Java
## โš™๏ธ **Installation & Setup**
### **1. Clone Repository**
```bash
git clone https://github.com/fisherman611/gaia-agent.git
cd gaia-agent
```
### **2. Install Dependencies**
```bash
pip install -r requirements.txt
```
### **3. Environment Variables**
Create a `.env` file with your API keys:
```env
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_ROLE_KEY=your_supabase_key
GROQ_API_KEY=your_groq_api_key
TAVILY_API_KEY=your_tavily_api_key
HUGGINGFACEHUB_API_TOKEN=your_hf_token
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=ai_agent_course
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
```
### **4. Database Setup (Supabase)**
Execute this SQL in your Supabase database:
```sql
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create match function for documents2 table
CREATE OR REPLACE FUNCTION public.match_documents_2(
query_embedding vector(768)
)
RETURNS TABLE(
id bigint,
content text,
metadata jsonb,
embedding vector(768),
similarity double precision
)
LANGUAGE sql STABLE
AS $$
SELECT
id,
content,
metadata,
embedding,
1 - (embedding <=> query_embedding) AS similarity
FROM public.documents2
ORDER BY embedding <=> query_embedding
LIMIT 10;
$$;
-- Grant permissions
GRANT EXECUTE ON FUNCTION public.match_documents_2(vector) TO anon, authenticated;
```
## ๐Ÿš€ **Running the Application**
### **Chatbot Interface**
```bash
python app.py
```
Access at: `http://localhost:7860`
### **Evaluation Runner**
```bash
python evaluation_app.py
```
Access at: `http://localhost:7860`
### **Live Demo**
Try it online: [Hugging Face Space](https://huggingface.co/spaces/fisherman611/gaia-agent)
## ๐Ÿ”— **Resources**
- [GAIA Benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
- [Hugging Face Agents Course](https://huggingface.co/agents-course)
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [Supabase Vector Store](https://supabase.com/docs/guides/ai/vector-columns)
## ๐Ÿค **Contributing**
Contributions are welcome! Areas for improvement:
- **New Tools**: Add specialized tools for specific domains
- **UI Enhancements**: Improve the chatbot interface
- **Performance**: Optimize response times and accuracy
- **Documentation**: Expand examples and use cases
## ๐Ÿ“„ **License**
This project is licensed under the [MIT License](https://mit-license.org/).