Spaces:
Sleeping
Sleeping
Now that you’re ready to dive deeper into the creation of your final agent, let’s see how you can submit it for review. | |
The Dataset | |
The Dataset used in this leaderboard consist of 20 questions extracted from the level 1 questions of the validation set from GAIA. | |
The chosen question were filtered based on the number of tools and steps needed to answer a question. | |
Based on the current look of the GAIA benchmark, we think that getting you to try to aim for 30% on level 1 question is a fair test. | |
GAIA current status! | |
The process | |
Now the big question in your mind is probably : “How do I start submitting ?” | |
For this Unit, we created an API that will allow you to get the questions, and send your answers for scoring. Here is a summary of the routes (see the live documentation for interactive details): | |
GET /questions: Retrieve the full list of filtered evaluation questions. | |
GET /random-question: Fetch a single random question from the list. | |
GET /files/{task_id}: Download a specific file associated with a given task ID. | |
POST /submit: Submit agent answers, calculate the score, and update the leaderboard. | |
The submit function will compare the answer to the ground truth in an EXACT MATCH manner, hence prompt it well ! The GAIA team shared a prompting example for your agent here (for the sake of this course, make sure you don’t include the text “FINAL ANSWER” in your submission, just make your agent reply with the answer and nothing else). | |
🎨 Make the Template Your Own! | |
To demonstrate the process of interacting with the API, we’ve included a basic template as a starting point. | |
Please feel free—and actively encouraged—to change, add to, or completely restructure it! Modify it in any way that best suits your approach and creativity. | |
In order to submit this templates compute 3 things needed by the API : | |
Username: Your Hugging Face username (here obtained via Gradio login), which is used to identify your submission. | |
Code Link (agent_code): the URL linking to your Hugging Face Space code (.../tree/main) for verification purposes, so please keep your space public. | |
Answers (answers): The list of responses ({"task_id": ..., "submitted_answer": ...}) generated by your Agent for scoring. |