HF_Agents_Final_Project / docs /evaluate_local_commands.md
Yago Bolivar
chore: update documentation and add evaluation scripts for GAIA project
aa49c02
|
raw
history blame
751 Bytes

Run the Evaluation Script: Open your terminal, navigate to the utilities directory, and run the script:

  • Evaluate all levels:
    cd /Users/yagoairm2/Desktop/agents/final\ projectHF_Agents_Final_Project/utilities
    python evaluate_local.py --answers_file .agent_answers.json
    
  • Evaluate only Level 1:
    pythonevaluate_local.py --answers_file ../gent_answers.json --level 1
    
  • Evaluate Level 1 and show incorrect answers:
    python evaluate_local.py --answers_file ..agent_answers.json --level 1 --verbose
    

This script will calculate and print the accuracy based on the exact match criterion used by GAIA, without submitting anything to the official leaderboard.