Yago Bolivar
commited on
Commit
·
a3c3cd5
1
Parent(s):
13efa1c
chore: remove outdated evaluation script documentation
Browse files
docs/evaluate_local_commands.md
DELETED
@@ -1,17 +0,0 @@
|
|
1 |
-
**Run the Evaluation Script:** Open your terminal, navigate to the `utilities` directory, and run the script:
|
2 |
-
|
3 |
-
* **Evaluate all levels:**
|
4 |
-
```bash
|
5 |
-
cd /Users/yagoairm2/Desktop/agents/final\ projectHF_Agents_Final_Project/utilities
|
6 |
-
python evaluate_local.py --answers_file .agent_answers.json
|
7 |
-
```
|
8 |
-
* **Evaluate only Level 1:**
|
9 |
-
```bash
|
10 |
-
pythonevaluate_local.py --answers_file ../gent_answers.json --level 1
|
11 |
-
```
|
12 |
-
* **Evaluate Level 1 and show incorrect answers:**
|
13 |
-
```bash
|
14 |
-
python evaluate_local.py --answers_file ..agent_answers.json --level 1 --verbose
|
15 |
-
```
|
16 |
-
|
17 |
-
This script will calculate and print the accuracy based on the exact match criterion used by GAIA, without submitting anything to the official leaderboard.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|