Spaces:
Runtime error
Runtime error
metadata
title: Enhanced GAIA Agent - Full Benchmark Implementation
emoji: ๐
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
๐ Enhanced GAIA Agent - Full Benchmark Implementation
Optimized for 30%+ performance on GAIA benchmark with complete API integration
๐ฏ Overview
This is a comprehensive GAIA (General AI Assistants) agent implementation designed to achieve the target 30% performance for course certification. The agent features complete API integration, enhanced multi-step reasoning, and advanced tool orchestration.
โจ Key Enhancements
๐ Full GAIA API Integration
- โ
Fetch questions from official GAIA API (
GET /questions
) - โ
Get random questions (
GET /random-question
) - โ
Download task files (
GET /files/{task_id}
) - โ
Submit answers for official scoring (
POST /submit
) - โ Real-time leaderboard submission
๐ง Enhanced Multi-Step Reasoning
- Advanced Workflow: Analyze โ Plan โ Act โ Observe โ Reason โ Answer
- Reasoning Memory: Maintains context across 15+ reasoning steps
- Question Classification: Automatic complexity assessment (Level 1-3)
- Tool Orchestration: Intelligent tool selection and execution
๐ ๏ธ Enhanced Tool Arsenal (9 Tools)
- ๐งฎ Enhanced Calculator - Complex mathematical operations
- ๐ Enhanced Web Search - Expanded knowledge base (20+ countries)
- ๐ผ๏ธ Image Analyzer - Visual content processing and spatial reasoning
- ๐ Document Reader - File content extraction
- ๐ File Processor - Download and process GAIA task files
- ๐ Date Calculator - Temporal reasoning and age calculations
- ๐ Unit Converter - Length, temperature, weight conversions
- ๐ Text Analyzer - Content analysis and pattern extraction
- ๐ง Reasoning Chain - Multi-step logical synthesis
๐ Enhanced Knowledge Base
- Geography: 20+ countries and capitals
- Astronomy: Solar system facts, planet classifications (8 planets, 4 gas giants)
- History: Key events (Berlin Wall fall 1989, Cold War end, etc.)
- Mathematics: Constants (ฯ, e, golden ratio) and conversion factors
- Arts: Famous paintings and artists
๐ฏ GAIA Compliance Features
โ Level 1: Basic Questions (<5 steps)
- Simple mathematical calculations
- Geographic knowledge queries
- Basic factual lookups
โ Level 2: Multi-Step Reasoning (5-10 steps)
- Complex calculations with multiple components
- Cross-domain knowledge synthesis
- Tool coordination and chaining
โ Level 3: Long-Term Planning
- Advanced reasoning with 15+ steps
- File processing and analysis
- Multi-modal understanding simulation
๐ Performance Targets
Metric | Target | Baseline | Status |
---|---|---|---|
Minimum Required | 30% | GPT-4 ~15% | ๐ฏ Optimized |
Enhanced Target | 35-45% | Human ~92% | ๐ Achievable |
Certification | 30%+ | Course Requirement | โ Ready |
๐ ๏ธ Technical Implementation
Core Components
gaia_agent.py
: Enhanced agent with full capabilities (800+ lines)app.py
: Complete Gradio interface with API integrationrequirements.txt
: Enhanced dependencies for full functionality
Enhanced Dependencies
gradio==4.44.0 # Latest UI framework
requests==2.31.0 # API connectivity
pandas==2.1.0 # Data processing
beautifulsoup4==4.12.2 # Content parsing
pillow==10.0.1 # Image processing
markdownify==0.11.6 # Document formatting
API Integration
# Fetch questions
questions = agent.get_questions()
# Process with file support
answer = agent.query(question, task_id="task_123")
# Submit for scoring
result = agent.submit_answer(username, agent_code_url, answers)
๐ฑ User Interface
๐ฏ GAIA Questions Tab
- Fetch real questions from GAIA API
- Automatic file download and processing
- Enhanced reasoning with memory display
โ๏ธ Manual Input Tab
- Test custom questions
- Example questions for different complexity levels
- Immediate processing and feedback
๐ Submission & Scoring Tab
- Official GAIA leaderboard submission
- Progress tracking and statistics
- Performance monitoring
๐ ๏ธ Agent Details Tab
- Complete capability documentation
- Tool descriptions and examples
- Performance benchmarks
๐งช Example Capabilities
Mathematical Reasoning
Q: If there are 8 planets and 4 are gas giants, how many are not gas giants?
A: 4
Geographic Knowledge
Q: What is the capital of Germany?
A: Berlin
Historical Research
Q: Who was the US president when the Berlin Wall fell?
A: George H.W. Bush
Complex Calculations
Q: Convert 100 degrees Celsius to Fahrenheit
A: 212.0
๐ฏ Usage Instructions
1. Setup Environment
pip install -r requirements.txt
python app.py
2. Fetch GAIA Questions
- Click "Get Random Question" to fetch from API
- Questions include task ID and associated files
- Files are automatically downloaded and processed
3. Process Questions
- Enhanced agent uses 15-step reasoning
- Multiple tools are orchestrated intelligently
- Reasoning memory is displayed for transparency
4. Submit for Scoring
- Provide Hugging Face username
- Include agent code URL (your Space link)
- Submit accumulated answers for official scoring
๐ Certification Ready
This implementation is specifically optimized to achieve the 30% target performance required for course certification:
- โ Complete API Integration - Connects to official GAIA endpoints
- โ Enhanced Reasoning - 15-step multi-tool workflow
- โ Expanded Knowledge - Comprehensive knowledge base
- โ File Processing - Handles task-associated files
- โ Clean Formatting - Exact match answer preparation
- โ Progress Tracking - Real-time performance monitoring
๐ Optimization Results
Component | Before | After | Improvement |
---|---|---|---|
Tools | 5 basic | 9 enhanced | +80% capability |
Knowledge Base | 8 entries | 50+ entries | +500% coverage |
Reasoning Steps | 10 max | 15 max | +50% depth |
API Integration | None | Full | Complete |
File Support | None | TXT/JSON/CSV | Advanced |
๐ฏ Ready for GAIA Benchmark - Targeting 30%+ Performance for Course Certification
Modular GAIA Agent
A production-ready, GAIA benchmark-compliant agent for Hugging Face's AI Agents course. Handles multi-modal questions, file downloads, and tool chaining with strict GAIA output formatting.
Features
- Modular tool/LLM registry (easy to extend)
- Best-in-class Hugging Face models for LLM, QA, table QA, ASR, image captioning
- File download/caching and type routing
- Multi-step reasoning and tool chaining
- GAIA-compliant output and reasoning trace
- Advanced YouTube/Video QA: Frame extraction, object detection (YOLOv8), image captioning (BLIP), and audio transcription (Whisper)
- Robust error handling and logging: All errors are logged to
gaia_agent.log
and user-friendly messages are returned - Secure code execution: Python code is run in a subprocess with timeout and resource limits
- Automated testing: Unit and integration tests with pytest
Usage
Install dependencies
pip install -r requirements.txt
# Also install yt-dlp (for YouTube/video QA)
pip install yt-dlp
# Download YOLOv8 weights if needed
python -c "from ultralytics import YOLO; YOLO('yolov8n.pt')"
Run the agent
from gaia_agent import ModularGAIAAgent
agent = ModularGAIAAgent()
results = agent.run(from_api=True)
for r in results:
print(r)
Run the Gradio UI
python app.py
Run tests
pytest tests/
Debugging and Logging
- All errors and important events are logged to
gaia_agent.log
. - Set the agent's debug flag for verbose output (see code).
Security
- Python code is executed in a subprocess with a timeout (default 5s).
- For extra safety, consider running the agent in a containerized environment.
File Structure
gaia_agent.py
: Main agent logicrequirements.txt
: DependenciesREADME.md
: This fileapp.py
: Gradio UItests/
: Automated testsgaia_agent_files/
: Example/context files
Example Screenshot
Notes
- Requires a Hugging Face token for some models/APIs
- Designed for easy extension and robust, production use
- For video QA, ensure
yt-dlp
and YOLOv8 weights are available