Spaces:

Duplicated from agents-course/Final_Assignment_Template

leroidubuffet
/

HF_Agents_Final_Project

Sleeping

App Files Files Community

HF_Agents_Final_Project / docs /evaluate_local_commands.md

Yago Bolivar

chore: update documentation and add evaluation scripts for GAIA project

aa49c02 4 months ago

|

751 Bytes

	Run the Evaluation Script: Open your terminal, navigate to the `utilities` directory, and run the script:

	* Evaluate all levels:
	```bash
	cd /Users/yagoairm2/Desktop/agents/final\ projectHF_Agents_Final_Project/utilities
	python evaluate_local.py --answers_file .agent_answers.json
	```
	* Evaluate only Level 1:
	```bash
	pythonevaluate_local.py --answers_file ../gent_answers.json --level 1
	```
	* Evaluate Level 1 and show incorrect answers:
	```bash
	python evaluate_local.py --answers_file ..agent_answers.json --level 1 --verbose
	```

	This script will calculate and print the accuracy based on the exact match criterion used by GAIA, without submitting anything to the official leaderboard.