Daniil Bogdanov commited on
Commit
e035237
Β·
1 Parent(s): c531eac

Release v2

Browse files
Files changed (2) hide show
  1. README.md +45 -2
  2. app.py +3 -5
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Template Final Assignment
3
  emoji: πŸ•΅πŸ»β€β™‚οΈ
4
  colorFrom: indigo
5
  colorTo: indigo
@@ -12,4 +12,47 @@ hf_oauth: true
12
  hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: GAIA Benchmark Agent - Final Assessment
3
  emoji: πŸ•΅πŸ»β€β™‚οΈ
4
  colorFrom: indigo
5
  colorTo: indigo
 
12
  hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
+ # AI Agent for GAIA Benchmark
16
+
17
+ **Final assessment for the Hugging Face AI Agents course**
18
+
19
+ This repository contains a fully implemented autonomous agent designed to solve the [GAIA benchmark](https://arxiv.org/abs/2403.08790) - level 1. The agent leverages large language models and a suite of external tools to tackle complex, real-world, multi-modal tasks. It is ready to run and submit answers to the GAIA evaluation server, and is deployable as a HuggingFace Space with a Gradio interface.
20
+
21
+ ## πŸ† Project Summary
22
+ - **Purpose:** Automatically solve and submit answers for the GAIA benchmark, which evaluates generalist AI agents on tasks requiring reasoning, code execution, web search, data analysis, and more.
23
+ - **Features:**
24
+ - Uses LLMs (OpenAI, HuggingFace, etc.) for reasoning and planning
25
+ - Integrates multiple tools: web search, Wikipedia, Python code execution, YouTube transcript, and more
26
+ - Handles file-based and multi-modal tasks
27
+ - Submits results and displays scores in a user-friendly Gradio interface
28
+
29
+ ## πŸš€ How to Run
30
+
31
+ **On HuggingFace Spaces:**
32
+ - Log in with your HuggingFace account.
33
+ - Click "Run Evaluation & Submit All Answers" to evaluate the agent on the GAIA benchmark and see your score.
34
+
35
+ **Locally:**
36
+ ```bash
37
+ pip install -r requirements.txt
38
+ python app.py
39
+ ```
40
+
41
+ ## 🧠 About GAIA
42
+ GAIA is a challenging benchmark for evaluating the capabilities of generalist AI agents on real-world, multi-step, and multi-modal tasks. Each task may require code execution, web search, data analysis, or other tool use. This agent is designed to autonomously solve such tasks and submit answers for evaluation.
43
+
44
+ ## πŸ—οΈ Architecture
45
+ - `app.py` β€” Gradio app and evaluation logic. Fetches questions, runs the agent, and submits answers
46
+ - `agent.py` β€” Main `Agent` class. Implements reasoning, tool use, and answer formatting
47
+ - `model.py` β€” Loads and manages LLM backends (OpenAI, HuggingFace, LiteLLM, etc.)
48
+ - `tools.py` β€” Implements external tools
49
+ - `utils/logger.py` β€” Logging utility
50
+ - `requirements.txt` β€” All dependencies for local and Spaces deployment
51
+
52
+ ## πŸ”‘ Environment Variables
53
+ Some models require API keys. Set these in your Space or local environment:
54
+ - `OPENAI_API_KEY` and `OPENAI_API_BASE` (for OpenAI models)
55
+ - `HUGGINGFACEHUB_API_TOKEN` (for HuggingFace Hub models)
56
+
57
+ ## πŸ“¦ Dependencies
58
+ All required packages are listed in `requirements.txt`
app.py CHANGED
@@ -191,14 +191,12 @@ with gr.Blocks() as demo:
191
  """
192
  **Instructions:**
193
 
194
- 1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
195
- 2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
196
- 3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
197
 
198
  ---
199
  **Disclaimers:**
200
- Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
201
- This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
202
  """
203
  )
204
 
 
191
  """
192
  **Instructions:**
193
 
194
+ 1. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
195
+ 2. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
 
196
 
197
  ---
198
  **Disclaimers:**
199
+ Once clicking on the "submit button, it can take quite some time (this is the time for the agent to go through all the questions).
 
200
  """
201
  )
202