A newer version of the Gradio SDK is available:
5.43.1
You can access the GAIA benchmark API provided for your agent's evaluation using the following endpoints, as described in the provided documentation:
Base URL
The API's base URL (according to the provided document) is:
https://agents-course-unit4-scoring.hf.space
API Endpoints
Retrieve Evaluation Questions
Endpoint:GET /questions
- Returns the full list of filtered evaluation questions.
Retrieve a Random Question
Endpoint:GET /random-question
- Fetches a single random question from the available set.
Download Associated Files
Endpoint:GET /files/{task_id}
- Downloads files associated with specific tasks, useful for questions that require external data or multimodal analysis.
Submit Agent Answers
Endpoint:POST /submit
- Submit your agent's answers to be evaluated against the benchmark.
- Requires JSON payload structured as:
{ "username": "Your Hugging Face username", "agent_code": "URL to your Hugging Face Space code repository", "answers": [{"task_id": "task identifier", "submitted_answer": "your answer"}] }
API Usage Example:
Here's an illustrative example using Python and the requests
library:
import requests
BASE_URL = "https://agents-course-unit4-scoring.hf.space"
# Retrieve all questions
response = requests.get(f"{BASE_URL}/questions")
questions = response.json()
# Fetch a random question
random_question = requests.get(f"{BASE_URL}/random-question").json()
# Download a file for a specific task_id
task_id = "example_task_id"
file_response = requests.get(f"{BASE_URL}/files/{task_id}")
with open("downloaded_file", "wb") as f:
f.write(file_response.content)
# Submit answers
submission_payload = {
"username": "your_username",
"agent_code": "https://huggingface.co/spaces/your_username/your_space_name/tree/main",
"answers": [{"task_id": "task_id", "submitted_answer": "answer_text"}]
}
submit_response = requests.post(f"{BASE_URL}/submit", json=submission_payload)
print(submit_response.json())
Ensure you have proper authentication if required, and replace placeholder texts (your_username
, task_id
, answer_text
, etc.) with your actual values.
Let me know if you need further assistance or more detailed examples!