Daniil Bogdanov
commited on
Commit
Β·
e035237
1
Parent(s):
c531eac
Release v2
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
emoji: π΅π»ββοΈ
|
4 |
colorFrom: indigo
|
5 |
colorTo: indigo
|
@@ -12,4 +12,47 @@ hf_oauth: true
|
|
12 |
hf_oauth_expiration_minutes: 480
|
13 |
---
|
14 |
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: GAIA Benchmark Agent - Final Assessment
|
3 |
emoji: π΅π»ββοΈ
|
4 |
colorFrom: indigo
|
5 |
colorTo: indigo
|
|
|
12 |
hf_oauth_expiration_minutes: 480
|
13 |
---
|
14 |
|
15 |
+
# AI Agent for GAIA Benchmark
|
16 |
+
|
17 |
+
**Final assessment for the Hugging Face AI Agents course**
|
18 |
+
|
19 |
+
This repository contains a fully implemented autonomous agent designed to solve the [GAIA benchmark](https://arxiv.org/abs/2403.08790) - level 1. The agent leverages large language models and a suite of external tools to tackle complex, real-world, multi-modal tasks. It is ready to run and submit answers to the GAIA evaluation server, and is deployable as a HuggingFace Space with a Gradio interface.
|
20 |
+
|
21 |
+
## π Project Summary
|
22 |
+
- **Purpose:** Automatically solve and submit answers for the GAIA benchmark, which evaluates generalist AI agents on tasks requiring reasoning, code execution, web search, data analysis, and more.
|
23 |
+
- **Features:**
|
24 |
+
- Uses LLMs (OpenAI, HuggingFace, etc.) for reasoning and planning
|
25 |
+
- Integrates multiple tools: web search, Wikipedia, Python code execution, YouTube transcript, and more
|
26 |
+
- Handles file-based and multi-modal tasks
|
27 |
+
- Submits results and displays scores in a user-friendly Gradio interface
|
28 |
+
|
29 |
+
## π How to Run
|
30 |
+
|
31 |
+
**On HuggingFace Spaces:**
|
32 |
+
- Log in with your HuggingFace account.
|
33 |
+
- Click "Run Evaluation & Submit All Answers" to evaluate the agent on the GAIA benchmark and see your score.
|
34 |
+
|
35 |
+
**Locally:**
|
36 |
+
```bash
|
37 |
+
pip install -r requirements.txt
|
38 |
+
python app.py
|
39 |
+
```
|
40 |
+
|
41 |
+
## π§ About GAIA
|
42 |
+
GAIA is a challenging benchmark for evaluating the capabilities of generalist AI agents on real-world, multi-step, and multi-modal tasks. Each task may require code execution, web search, data analysis, or other tool use. This agent is designed to autonomously solve such tasks and submit answers for evaluation.
|
43 |
+
|
44 |
+
## ποΈ Architecture
|
45 |
+
- `app.py` β Gradio app and evaluation logic. Fetches questions, runs the agent, and submits answers
|
46 |
+
- `agent.py` β Main `Agent` class. Implements reasoning, tool use, and answer formatting
|
47 |
+
- `model.py` β Loads and manages LLM backends (OpenAI, HuggingFace, LiteLLM, etc.)
|
48 |
+
- `tools.py` β Implements external tools
|
49 |
+
- `utils/logger.py` β Logging utility
|
50 |
+
- `requirements.txt` β All dependencies for local and Spaces deployment
|
51 |
+
|
52 |
+
## π Environment Variables
|
53 |
+
Some models require API keys. Set these in your Space or local environment:
|
54 |
+
- `OPENAI_API_KEY` and `OPENAI_API_BASE` (for OpenAI models)
|
55 |
+
- `HUGGINGFACEHUB_API_TOKEN` (for HuggingFace Hub models)
|
56 |
+
|
57 |
+
## π¦ Dependencies
|
58 |
+
All required packages are listed in `requirements.txt`
|
app.py
CHANGED
@@ -191,14 +191,12 @@ with gr.Blocks() as demo:
|
|
191 |
"""
|
192 |
**Instructions:**
|
193 |
|
194 |
-
1.
|
195 |
-
2.
|
196 |
-
3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
|
197 |
|
198 |
---
|
199 |
**Disclaimers:**
|
200 |
-
Once clicking on the "submit button, it can take quite some time (
|
201 |
-
This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
|
202 |
"""
|
203 |
)
|
204 |
|
|
|
191 |
"""
|
192 |
**Instructions:**
|
193 |
|
194 |
+
1. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
|
195 |
+
2. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
|
|
|
196 |
|
197 |
---
|
198 |
**Disclaimers:**
|
199 |
+
Once clicking on the "submit button, it can take quite some time (this is the time for the agent to go through all the questions).
|
|
|
200 |
"""
|
201 |
)
|
202 |
|