Omachoko
GAIA agent: ready for Hugging Face Spaces deployment
997480e
|
raw
history blame
8.7 kB
metadata
title: Enhanced GAIA Agent - Full Benchmark Implementation
emoji: ๐Ÿš€
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

๐Ÿš€ Enhanced GAIA Agent - Full Benchmark Implementation

Optimized for 30%+ performance on GAIA benchmark with complete API integration

๐ŸŽฏ Overview

This is a comprehensive GAIA (General AI Assistants) agent implementation designed to achieve the target 30% performance for course certification. The agent features complete API integration, enhanced multi-step reasoning, and advanced tool orchestration.

โœจ Key Enhancements

๐Ÿ”— Full GAIA API Integration

  • โœ… Fetch questions from official GAIA API (GET /questions)
  • โœ… Get random questions (GET /random-question)
  • โœ… Download task files (GET /files/{task_id})
  • โœ… Submit answers for official scoring (POST /submit)
  • โœ… Real-time leaderboard submission

๐Ÿง  Enhanced Multi-Step Reasoning

  • Advanced Workflow: Analyze โ†’ Plan โ†’ Act โ†’ Observe โ†’ Reason โ†’ Answer
  • Reasoning Memory: Maintains context across 15+ reasoning steps
  • Question Classification: Automatic complexity assessment (Level 1-3)
  • Tool Orchestration: Intelligent tool selection and execution

๐Ÿ› ๏ธ Enhanced Tool Arsenal (9 Tools)

  1. ๐Ÿงฎ Enhanced Calculator - Complex mathematical operations
  2. ๐ŸŒ Enhanced Web Search - Expanded knowledge base (20+ countries)
  3. ๐Ÿ–ผ๏ธ Image Analyzer - Visual content processing and spatial reasoning
  4. ๐Ÿ“„ Document Reader - File content extraction
  5. ๐Ÿ“ File Processor - Download and process GAIA task files
  6. ๐Ÿ“… Date Calculator - Temporal reasoning and age calculations
  7. ๐Ÿ”„ Unit Converter - Length, temperature, weight conversions
  8. ๐Ÿ“ Text Analyzer - Content analysis and pattern extraction
  9. ๐Ÿง  Reasoning Chain - Multi-step logical synthesis

๐Ÿ“Š Enhanced Knowledge Base

  • Geography: 20+ countries and capitals
  • Astronomy: Solar system facts, planet classifications (8 planets, 4 gas giants)
  • History: Key events (Berlin Wall fall 1989, Cold War end, etc.)
  • Mathematics: Constants (ฯ€, e, golden ratio) and conversion factors
  • Arts: Famous paintings and artists

๐ŸŽฏ GAIA Compliance Features

โœ… Level 1: Basic Questions (<5 steps)

  • Simple mathematical calculations
  • Geographic knowledge queries
  • Basic factual lookups

โœ… Level 2: Multi-Step Reasoning (5-10 steps)

  • Complex calculations with multiple components
  • Cross-domain knowledge synthesis
  • Tool coordination and chaining

โœ… Level 3: Long-Term Planning

  • Advanced reasoning with 15+ steps
  • File processing and analysis
  • Multi-modal understanding simulation

๐Ÿš€ Performance Targets

Metric Target Baseline Status
Minimum Required 30% GPT-4 ~15% ๐ŸŽฏ Optimized
Enhanced Target 35-45% Human ~92% ๐Ÿ“ˆ Achievable
Certification 30%+ Course Requirement โœ… Ready

๐Ÿ› ๏ธ Technical Implementation

Core Components

  • gaia_agent.py: Enhanced agent with full capabilities (800+ lines)
  • app.py: Complete Gradio interface with API integration
  • requirements.txt: Enhanced dependencies for full functionality

Enhanced Dependencies

gradio==4.44.0          # Latest UI framework
requests==2.31.0        # API connectivity
pandas==2.1.0           # Data processing
beautifulsoup4==4.12.2  # Content parsing
pillow==10.0.1          # Image processing
markdownify==0.11.6     # Document formatting

API Integration

# Fetch questions
questions = agent.get_questions()

# Process with file support
answer = agent.query(question, task_id="task_123")

# Submit for scoring
result = agent.submit_answer(username, agent_code_url, answers)

๐Ÿ“ฑ User Interface

๐ŸŽฏ GAIA Questions Tab

  • Fetch real questions from GAIA API
  • Automatic file download and processing
  • Enhanced reasoning with memory display

โœ๏ธ Manual Input Tab

  • Test custom questions
  • Example questions for different complexity levels
  • Immediate processing and feedback

๐Ÿ“Š Submission & Scoring Tab

  • Official GAIA leaderboard submission
  • Progress tracking and statistics
  • Performance monitoring

๐Ÿ› ๏ธ Agent Details Tab

  • Complete capability documentation
  • Tool descriptions and examples
  • Performance benchmarks

๐Ÿงช Example Capabilities

Mathematical Reasoning

Q: If there are 8 planets and 4 are gas giants, how many are not gas giants?
A: 4

Geographic Knowledge

Q: What is the capital of Germany?
A: Berlin

Historical Research

Q: Who was the US president when the Berlin Wall fell?
A: George H.W. Bush

Complex Calculations

Q: Convert 100 degrees Celsius to Fahrenheit
A: 212.0

๐ŸŽฏ Usage Instructions

1. Setup Environment

pip install -r requirements.txt
python app.py

2. Fetch GAIA Questions

  • Click "Get Random Question" to fetch from API
  • Questions include task ID and associated files
  • Files are automatically downloaded and processed

3. Process Questions

  • Enhanced agent uses 15-step reasoning
  • Multiple tools are orchestrated intelligently
  • Reasoning memory is displayed for transparency

4. Submit for Scoring

  • Provide Hugging Face username
  • Include agent code URL (your Space link)
  • Submit accumulated answers for official scoring

๐Ÿ† Certification Ready

This implementation is specifically optimized to achieve the 30% target performance required for course certification:

  • โœ… Complete API Integration - Connects to official GAIA endpoints
  • โœ… Enhanced Reasoning - 15-step multi-tool workflow
  • โœ… Expanded Knowledge - Comprehensive knowledge base
  • โœ… File Processing - Handles task-associated files
  • โœ… Clean Formatting - Exact match answer preparation
  • โœ… Progress Tracking - Real-time performance monitoring

๐Ÿ“Š Optimization Results

Component Before After Improvement
Tools 5 basic 9 enhanced +80% capability
Knowledge Base 8 entries 50+ entries +500% coverage
Reasoning Steps 10 max 15 max +50% depth
API Integration None Full Complete
File Support None TXT/JSON/CSV Advanced

๐ŸŽฏ Ready for GAIA Benchmark - Targeting 30%+ Performance for Course Certification

Modular GAIA Agent

A production-ready, GAIA benchmark-compliant agent for Hugging Face's AI Agents course. Handles multi-modal questions, file downloads, and tool chaining with strict GAIA output formatting.

Features

  • Modular tool/LLM registry (easy to extend)
  • Best-in-class Hugging Face models for LLM, QA, table QA, ASR, image captioning
  • File download/caching and type routing
  • Multi-step reasoning and tool chaining
  • GAIA-compliant output and reasoning trace
  • Advanced YouTube/Video QA: Frame extraction, object detection (YOLOv8), image captioning (BLIP), and audio transcription (Whisper)
  • Robust error handling and logging: All errors are logged to gaia_agent.log and user-friendly messages are returned
  • Secure code execution: Python code is run in a subprocess with timeout and resource limits
  • Automated testing: Unit and integration tests with pytest

Usage

Install dependencies

pip install -r requirements.txt
# Also install yt-dlp (for YouTube/video QA)
pip install yt-dlp
# Download YOLOv8 weights if needed
python -c "from ultralytics import YOLO; YOLO('yolov8n.pt')"

Run the agent

from gaia_agent import ModularGAIAAgent
agent = ModularGAIAAgent()
results = agent.run(from_api=True)
for r in results:
    print(r)

Run the Gradio UI

python app.py

Run tests

pytest tests/

Debugging and Logging

  • All errors and important events are logged to gaia_agent.log.
  • Set the agent's debug flag for verbose output (see code).

Security

  • Python code is executed in a subprocess with a timeout (default 5s).
  • For extra safety, consider running the agent in a containerized environment.

File Structure

  • gaia_agent.py: Main agent logic
  • requirements.txt: Dependencies
  • README.md: This file
  • app.py: Gradio UI
  • tests/: Automated tests
  • gaia_agent_files/: Example/context files

Example Screenshot

screenshot placeholder

Notes

  • Requires a Hugging Face token for some models/APIs
  • Designed for easy extension and robust, production use
  • For video QA, ensure yt-dlp and YOLOv8 weights are available