A newer version of the Gradio SDK is available:
5.29.1
title: GAIA Benchmark Agent - Final Assessment
emoji: π΅π»ββοΈ
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480
AI Agent for GAIA Benchmark
Final assessment for the Hugging Face AI Agents course
This repository contains a fully implemented autonomous agent designed to solve the GAIA benchmark - level 1. The agent leverages large language models and a suite of external tools to tackle complex, real-world, multi-modal tasks. It is ready to run and submit answers to the GAIA evaluation server, and is deployable as a HuggingFace Space with a Gradio interface.
Project Summary
- Purpose: Automatically solve and submit answers for the GAIA benchmark, which evaluates generalist AI agents on tasks requiring reasoning, code execution, web search, data analysis, and more.
- Features:
- Uses LLMs (OpenAI, HuggingFace, etc.) for reasoning and planning
- Integrates multiple tools: web search, Wikipedia, Python code execution, YouTube transcript, and more
- Handles file-based and multi-modal tasks
- Submits results and displays scores in a user-friendly Gradio interface
How to Run
On HuggingFace Spaces:
- Log in with your HuggingFace account.
- Click "Run Evaluation & Submit All Answers" to evaluate the agent on the GAIA benchmark and see your score.
Locally:
pip install -r requirements.txt
python app.py
About GAIA
GAIA is a challenging benchmark for evaluating the capabilities of generalist AI agents on real-world, multi-step, and multi-modal tasks. Each task may require code execution, web search, data analysis, or other tool use. This agent is designed to autonomously solve such tasks and submit answers for evaluation.
Architecture
app.py
β Gradio app and evaluation logic. Fetches questions, runs the agent, and submits answersagent.py
β MainAgent
class. Implements reasoning, tool use, and answer formattingmodel.py
β Loads and manages LLM backends (OpenAI, HuggingFace, LiteLLM, etc.)tools.py
β Implements external toolsutils/logger.py
β Logging utility
Environment Variables
Some models require API keys. Set these in your Space or local environment:
OPENAI_API_KEY
andOPENAI_API_BASE
(for OpenAI models)HUGGINGFACEHUB_API_TOKEN
(for HuggingFace Hub models)
Dependencies
All required packages are listed in requirements.txt