agent-course-final-assessment

Running

App Files Files Community

Daniil Bogdanov commited on Apr 27

Commit

e035237

1 Parent(s): c531eac

Release v2

Browse files

Files changed (2) hide show

README.md +45 -2
app.py +3 -5

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Template Final Assignment
 emoji: 🕵🏻‍♂️
 colorFrom: indigo
 colorTo: indigo
@@ -12,4 +12,47 @@ hf_oauth: true
 hf_oauth_expiration_minutes: 480
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: GAIA Benchmark Agent - Final Assessment
 emoji: 🕵🏻‍♂️
 colorFrom: indigo
 colorTo: indigo
 hf_oauth_expiration_minutes: 480
 ---
+# AI Agent for GAIA Benchmark
+**Final assessment for the Hugging Face AI Agents course**
+This repository contains a fully implemented autonomous agent designed to solve the [GAIA benchmark](https://arxiv.org/abs/2403.08790) - level 1. The agent leverages large language models and a suite of external tools to tackle complex, real-world, multi-modal tasks. It is ready to run and submit answers to the GAIA evaluation server, and is deployable as a HuggingFace Space with a Gradio interface.
+## 🏆 Project Summary
+- **Purpose:** Automatically solve and submit answers for the GAIA benchmark, which evaluates generalist AI agents on tasks requiring reasoning, code execution, web search, data analysis, and more.
+- **Features:**
+  - Uses LLMs (OpenAI, HuggingFace, etc.) for reasoning and planning
+  - Integrates multiple tools: web search, Wikipedia, Python code execution, YouTube transcript, and more
+  - Handles file-based and multi-modal tasks
+  - Submits results and displays scores in a user-friendly Gradio interface
+## 🚀 How to Run
+**On HuggingFace Spaces:**
+- Log in with your HuggingFace account.
+- Click "Run Evaluation & Submit All Answers" to evaluate the agent on the GAIA benchmark and see your score.
+**Locally:**
+```bash
+pip install -r requirements.txt
+python app.py
+```
+## 🧠 About GAIA
+GAIA is a challenging benchmark for evaluating the capabilities of generalist AI agents on real-world, multi-step, and multi-modal tasks. Each task may require code execution, web search, data analysis, or other tool use. This agent is designed to autonomously solve such tasks and submit answers for evaluation.
+## 🏗️ Architecture
+- `app.py` — Gradio app and evaluation logic. Fetches questions, runs the agent, and submits answers
+- `agent.py` — Main `Agent` class. Implements reasoning, tool use, and answer formatting
+- `model.py` — Loads and manages LLM backends (OpenAI, HuggingFace, LiteLLM, etc.)
+- `tools.py` — Implements external tools
+- `utils/logger.py` — Logging utility
+- `requirements.txt` — All dependencies for local and Spaces deployment
+## 🔑 Environment Variables
+Some models require API keys. Set these in your Space or local environment:
+- `OPENAI_API_KEY` and `OPENAI_API_BASE` (for OpenAI models)
+- `HUGGINGFACEHUB_API_TOKEN` (for HuggingFace Hub models)
+## 📦 Dependencies
+All required packages are listed in `requirements.txt`

app.py CHANGED Viewed

@@ -191,14 +191,12 @@ with gr.Blocks() as demo:
         """
         **Instructions:**
-        1.  Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
-        2.  Log in to your Hugging Face account using the button below. This uses your HF username for submission.
-        3.  Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
         ---
         **Disclaimers:**
-        Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
-        This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
         """
     )

         """
         **Instructions:**
+        1.  Log in to your Hugging Face account using the button below. This uses your HF username for submission.
+        2.  Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
         ---
         **Disclaimers:**
+        Once clicking on the "submit button, it can take quite some time (this is the time for the agent to go through all the questions).
         """
     )