**Run the Evaluation Script:** Open your terminal, navigate to the `utilities` directory, and run the script: | |
* **Evaluate all levels:** | |
```bash | |
cd /Users/yagoairm2/Desktop/agents/final\ projectHF_Agents_Final_Project/utilities | |
python evaluate_local.py --answers_file .agent_answers.json | |
``` | |
* **Evaluate only Level 1:** | |
```bash | |
pythonevaluate_local.py --answers_file ../gent_answers.json --level 1 | |
``` | |
* **Evaluate Level 1 and show incorrect answers:** | |
```bash | |
python evaluate_local.py --answers_file ..agent_answers.json --level 1 --verbose | |
``` | |
This script will calculate and print the accuracy based on the exact match criterion used by GAIA, without submitting anything to the official leaderboard. |