|
# Next Steps for GAIA Agent Development |
|
|
|
## Current Status |
|
- ✅ Created basic agent structure (`app2.py`) |
|
- ✅ Set up local testing environment (`app_local.py`) |
|
- ✅ Fixed question format handling |
|
- ✅ Tested local environment functionality |
|
|
|
## High Priority Tasks |
|
|
|
### 1. LLM Integration |
|
- [ ] Add GPT4All with Llama 3 integration |
|
- [ ] Update system prompts for proper GAIA answer formatting |
|
- [ ] Implement proper reasoning and answer extraction |
|
|
|
### 2. Core Tool Implementation |
|
- [ ] Web Search Tool (using SerpAPI, Google Custom Search API, or similar) |
|
- [ ] File Reader Tool (handling different file formats) |
|
- [ ] Text-based files (.txt, .py, .md) |
|
- [ ] Images (.png, .jpg) with vision model |
|
- [ ] Audio (.mp3) with speech-to-text |
|
- [ ] Spreadsheets (.xlsx) with pandas |
|
- [ ] Code Interpreter Tool (safe Python execution) |
|
|
|
### 3. Question Analysis & Planning |
|
- [ ] Use LLM for question classification |
|
- [ ] Implement multi-step reasoning for complex questions |
|
- [ ] Handle file references in questions |
|
|
|
### 4. Testing & Evaluation |
|
- [ ] Create test cases for each question type |
|
- [ ] Use `utilities/evaluate_local.py` to evaluate performance |
|
- [ ] Track accuracy improvements |
|
|
|
## Dependencies to add |
|
- [ ] `gpt4all` for LLM |
|
- [ ] `beautifulsoup4` for web scraping (if needed) |
|
- [ ] `pandas` for spreadsheet handling |
|
- [ ] Vision and speech-to-text libraries (TBD) |
|
|
|
## Notes |
|
- The GPT4All model path seems to be: "/Users/yagoairm2/Library/Application Support/nomic.ai/GPT4All/Meta-Llama-3-8B-Instruct.Q4_0.gguf" |
|
- Use the `common_questions.json` for testing |
|
- Follow GAIA evaluation criteria for exact answer matching |
|
|