Daniil Bogdanov
Release v5
a225ae4
---
title: GAIA Benchmark Agent - Final Assessment
emoji: πŸ•΅πŸ»β€β™‚οΈ
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
hf_oauth_expiration_minutes: 480
---
# AI Agent for GAIA Benchmark
**Final assessment for the Hugging Face AI Agents course**
This repository contains a fully implemented autonomous agent designed to solve the [GAIA benchmark](https://arxiv.org/abs/2311.12983) - level 1. The agent leverages large language models and a suite of external tools to tackle complex, real-world, multi-modal tasks. It is ready to run and submit answers to the GAIA evaluation server, and is deployable as a HuggingFace Space with a Gradio interface.
## Project Summary
- **Purpose:** Automatically solve and submit answers for the GAIA benchmark, which evaluates generalist AI agents on tasks requiring reasoning, code execution, web search, data analysis, and more.
- **Features:**
- Uses LLMs (OpenAI, HuggingFace, etc.) for reasoning and planning
- Integrates multiple tools: web search, Wikipedia, Python code execution, YouTube transcript, and more
- Handles file-based and multi-modal tasks
- Submits results and displays scores in a user-friendly Gradio interface
## How to Run
**On HuggingFace Spaces:**
- Log in with your HuggingFace account.
- Click "Run Evaluation & Submit All Answers" to evaluate the agent on the GAIA benchmark and see your score.
**Locally:**
```bash
pip install -r requirements.txt
python app.py
```
## About GAIA
GAIA is a challenging benchmark for evaluating the capabilities of generalist AI agents on real-world, multi-step, and multi-modal tasks. Each task may require code execution, web search, data analysis, or other tool use. This agent is designed to autonomously solve such tasks and submit answers for evaluation.
## Architecture
- `app.py` β€” Gradio app and evaluation logic. Fetches questions, runs the agent, and submits answers
- `agent.py` β€” Main `Agent` class. Implements reasoning, tool use, and answer formatting
- `model.py` β€” Loads and manages LLM backends (OpenAI, HuggingFace, LiteLLM, etc.)
- `tools.py` β€” Implements external tools
- `utils/logger.py` β€” Logging utility
## Environment Variables
Some models require API keys. Set these in your Space or local environment:
- `OPENAI_API_KEY` and `OPENAI_API_BASE` (for OpenAI models)
- `HUGGINGFACEHUB_API_TOKEN` (for HuggingFace Hub models)
## Dependencies
All required packages are listed in `requirements.txt`