File size: 2,607 Bytes
2705160 e035237 62ad9da 2705160 d123508 2705160 e035237 a225ae4 e035237 a225ae4 e035237 a225ae4 e035237 a225ae4 e035237 a225ae4 e035237 a225ae4 e035237 a225ae4 e035237 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
title: GAIA Benchmark Agent - Final Assessment
emoji: π΅π»ββοΈ
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
hf_oauth_expiration_minutes: 480
---
# AI Agent for GAIA Benchmark
**Final assessment for the Hugging Face AI Agents course**
This repository contains a fully implemented autonomous agent designed to solve the [GAIA benchmark](https://arxiv.org/abs/2311.12983) - level 1. The agent leverages large language models and a suite of external tools to tackle complex, real-world, multi-modal tasks. It is ready to run and submit answers to the GAIA evaluation server, and is deployable as a HuggingFace Space with a Gradio interface.
## Project Summary
- **Purpose:** Automatically solve and submit answers for the GAIA benchmark, which evaluates generalist AI agents on tasks requiring reasoning, code execution, web search, data analysis, and more.
- **Features:**
- Uses LLMs (OpenAI, HuggingFace, etc.) for reasoning and planning
- Integrates multiple tools: web search, Wikipedia, Python code execution, YouTube transcript, and more
- Handles file-based and multi-modal tasks
- Submits results and displays scores in a user-friendly Gradio interface
## How to Run
**On HuggingFace Spaces:**
- Log in with your HuggingFace account.
- Click "Run Evaluation & Submit All Answers" to evaluate the agent on the GAIA benchmark and see your score.
**Locally:**
```bash
pip install -r requirements.txt
python app.py
```
## About GAIA
GAIA is a challenging benchmark for evaluating the capabilities of generalist AI agents on real-world, multi-step, and multi-modal tasks. Each task may require code execution, web search, data analysis, or other tool use. This agent is designed to autonomously solve such tasks and submit answers for evaluation.
## Architecture
- `app.py` β Gradio app and evaluation logic. Fetches questions, runs the agent, and submits answers
- `agent.py` β Main `Agent` class. Implements reasoning, tool use, and answer formatting
- `model.py` β Loads and manages LLM backends (OpenAI, HuggingFace, LiteLLM, etc.)
- `tools.py` β Implements external tools
- `utils/logger.py` β Logging utility
## Environment Variables
Some models require API keys. Set these in your Space or local environment:
- `OPENAI_API_KEY` and `OPENAI_API_BASE` (for OpenAI models)
- `HUGGINGFACEHUB_API_TOKEN` (for HuggingFace Hub models)
## Dependencies
All required packages are listed in `requirements.txt`
|