File size: 751 Bytes
aa49c02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
**Run the Evaluation Script:** Open your terminal, navigate to the `utilities` directory, and run the script:

*   **Evaluate all levels:**
    ```bash
    cd /Users/yagoairm2/Desktop/agents/final\ projectHF_Agents_Final_Project/utilities
    python evaluate_local.py --answers_file .agent_answers.json
    ```
*   **Evaluate only Level 1:**
    ```bash
    pythonevaluate_local.py --answers_file ../gent_answers.json --level 1
     ```
*   **Evaluate Level 1 and show incorrect answers:**
    ```bash
    python evaluate_local.py --answers_file ..agent_answers.json --level 1 --verbose
    ```

This script will calculate and print the accuracy based on the exact match criterion used by GAIA, without submitting anything to the official leaderboard.