Spaces:

schoolkithub
/

multi-agent-gaia-system

Runtime error

App Files Files Community

Omachoko commited on 17 days ago

Commit

a9d900f

1 Parent(s): f58a18b

Finalize: move advanced agent to root, clean up, ready for deployment

Browse files

Files changed (7) hide show

.gitignore +0 -91
README.md +26 -258
app.py +371 -15
gaia_agent.py +0 -397
requirements.txt +10 -16
tests/test_agent_core.py +0 -38
tests/test_video_qa.py +0 -22

.gitignore DELETED Viewed

@@ -1,91 +0,0 @@
-# Python
-__pycache__/
-*.py[cod]
-*$py.class
-*.so
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-pip-wheel-metadata/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-# Virtual Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-gaia_env/
-# IDE
-.vscode/
-.idea/
-*.swp
-*.swo
-*~
-# OS
-.DS_Store
-.DS_Store?
-._*
-.Spotlight-V100
-.Trashes
-ehthumbs.db
-Thumbs.db
-# Logs
-*.log
-logs/
-# Environment variables
-.env
-.env.local
-.env.development.local
-.env.test.local
-.env.production.local
-# Jupyter Notebook
-.ipynb_checkpoints
-# pytest
-.pytest_cache/
-.tox/
-.coverage
-htmlcov/
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-# Hugging Face
-wandb/ __pycache__/
-__pycache__/
-# New additions
-gaia_env/
-gaia_agent.log
-*.pyc
-*.pyo
-*.pyd
-*.swp
-.DS_Store
-.env
-venv/
-gaia_agent_files/

README.md CHANGED Viewed

@@ -1,272 +1,40 @@
 ---
-title: Enhanced GAIA Agent - Full Benchmark Implementation
-emoji: 🚀
-colorFrom: blue
-colorTo: green
 sdk: gradio
-sdk_version: 4.44.0
 app_file: app.py
 pinned: false
-license: mit
 ---
-# 🚀 Enhanced GAIA Agent - Full Benchmark Implementation
-**Optimized for 30%+ performance on GAIA benchmark with complete API integration**
-## 🎯 Overview
-This is a comprehensive GAIA (General AI Assistants) agent implementation designed to achieve the target 30% performance for course certification. The agent features complete API integration, enhanced multi-step reasoning, and advanced tool orchestration.
-## ✨ Key Enhancements
-### 🔗 **Full GAIA API Integration**
-- ✅ Fetch questions from official GAIA API (`GET /questions`)
-- ✅ Get random questions (`GET /random-question`)
-- ✅ Download task files (`GET /files/{task_id}`)
-- ✅ Submit answers for official scoring (`POST /submit`)
-- ✅ Real-time leaderboard submission
-### 🧠 **Enhanced Multi-Step Reasoning**
-- **Advanced Workflow**: Analyze → Plan → Act → Observe → Reason → Answer
-- **Reasoning Memory**: Maintains context across 15+ reasoning steps
-- **Question Classification**: Automatic complexity assessment (Level 1-3)
-- **Tool Orchestration**: Intelligent tool selection and execution
-### 🛠️ **Enhanced Tool Arsenal** (9 Tools)
-1. **🧮 Enhanced Calculator** - Complex mathematical operations
-2. **🌐 Enhanced Web Search** - Expanded knowledge base (20+ countries)
-3. **🖼️ Image Analyzer** - Visual content processing and spatial reasoning
-4. **📄 Document Reader** - File content extraction
-5. **📁 File Processor** - Download and process GAIA task files
-6. **📅 Date Calculator** - Temporal reasoning and age calculations
-7. **🔄 Unit Converter** - Length, temperature, weight conversions
-8. **📝 Text Analyzer** - Content analysis and pattern extraction
-9. **🧠 Reasoning Chain** - Multi-step logical synthesis
-### 📊 **Enhanced Knowledge Base**
-- **Geography**: 20+ countries and capitals
-- **Astronomy**: Solar system facts, planet classifications (8 planets, 4 gas giants)
-- **History**: Key events (Berlin Wall fall 1989, Cold War end, etc.)
-- **Mathematics**: Constants (π, e, golden ratio) and conversion factors
-- **Arts**: Famous paintings and artists
-## 🎯 GAIA Compliance Features
-### ✅ **Level 1**: Basic Questions (<5 steps)
-- Simple mathematical calculations
-- Geographic knowledge queries
-- Basic factual lookups
-### ✅ **Level 2**: Multi-Step Reasoning (5-10 steps)
-- Complex calculations with multiple components
-- Cross-domain knowledge synthesis
-- Tool coordination and chaining
-### ✅ **Level 3**: Long-Term Planning
-- Advanced reasoning with 15+ steps
-- File processing and analysis
-- Multi-modal understanding simulation
-## 🚀 Performance Targets
-| Metric | Target | Baseline | Status |
-|--------|--------|----------|---------|
-| **Minimum Required** | 30% | GPT-4 ~15% | 🎯 Optimized |
-| **Enhanced Target** | 35-45% | Human ~92% | 📈 Achievable |
-| **Certification** | 30%+ | Course Requirement | ✅ Ready |
-## 🛠️ Technical Implementation
-### Core Components
-- `gaia_agent.py`: Enhanced agent with full capabilities (800+ lines)
-- `app.py`: Complete Gradio interface with API integration
-- `requirements.txt`: Enhanced dependencies for full functionality
-### Enhanced Dependencies
-```
-gradio==4.44.0          # Latest UI framework
-requests==2.31.0        # API connectivity
-pandas==2.1.0           # Data processing
-beautifulsoup4==4.12.2  # Content parsing
-pillow==10.0.1          # Image processing
-markdownify==0.11.6     # Document formatting
-```
-### API Integration
-```python
-# Fetch questions
-questions = agent.get_questions()
-# Process with file support
-answer = agent.query(question, task_id="task_123")
-# Submit for scoring
-result = agent.submit_answer(username, agent_code_url, answers)
-```
-## 📱 User Interface
-### 🎯 **GAIA Questions Tab**
-- Fetch real questions from GAIA API
-- Automatic file download and processing
-- Enhanced reasoning with memory display
-### ✏️ **Manual Input Tab**
-- Test custom questions
-- Example questions for different complexity levels
-- Immediate processing and feedback
-### 📊 **Submission & Scoring Tab**
-- Official GAIA leaderboard submission
-- Progress tracking and statistics
-- Performance monitoring
-### 🛠️ **Agent Details Tab**
-- Complete capability documentation
-- Tool descriptions and examples
-- Performance benchmarks
-## 🧪 Example Capabilities
-### Mathematical Reasoning
-```
-Q: If there are 8 planets and 4 are gas giants, how many are not gas giants?
-A: 4
-```
-### Geographic Knowledge
-```
-Q: What is the capital of Germany?
-A: Berlin
-```
-### Historical Research
-```
-Q: Who was the US president when the Berlin Wall fell?
-A: George H.W. Bush
-```
-### Complex Calculations
-```
-Q: Convert 100 degrees Celsius to Fahrenheit
-A: 212.0
-```
-## 🎯 Usage Instructions
-### 1. **Setup Environment**
-```bash
-pip install -r requirements.txt
-python app.py
-```
-### 2. **Fetch GAIA Questions**
-- Click "Get Random Question" to fetch from API
-- Questions include task ID and associated files
-- Files are automatically downloaded and processed
-### 3. **Process Questions**
-- Enhanced agent uses 15-step reasoning
-- Multiple tools are orchestrated intelligently
-- Reasoning memory is displayed for transparency
-### 4. **Submit for Scoring**
-- Provide Hugging Face username
-- Include agent code URL (your Space link)
-- Submit accumulated answers for official scoring
-## 🏆 Certification Ready
-This implementation is specifically optimized to achieve the **30% target performance** required for course certification:
-- ✅ **Complete API Integration** - Connects to official GAIA endpoints
-- ✅ **Enhanced Reasoning** - 15-step multi-tool workflow
-- ✅ **Expanded Knowledge** - Comprehensive knowledge base
-- ✅ **File Processing** - Handles task-associated files
-- ✅ **Clean Formatting** - Exact match answer preparation
-- ✅ **Progress Tracking** - Real-time performance monitoring
-## 📊 Optimization Results
-| Component | Before | After | Improvement |
-|-----------|--------|-------|-------------|
-| **Tools** | 5 basic | 9 enhanced | +80% capability |
-| **Knowledge Base** | 8 entries | 50+ entries | +500% coverage |
-| **Reasoning Steps** | 10 max | 15 max | +50% depth |
-| **API Integration** | None | Full | Complete |
-| **File Support** | None | TXT/JSON/CSV | Advanced |
----
-**🎯 Ready for GAIA Benchmark - Targeting 30%+ Performance for Course Certification**
-# Modular GAIA Agent
-A production-ready, GAIA benchmark-compliant agent for Hugging Face's AI Agents course. Handles multi-modal questions, file downloads, and tool chaining with strict GAIA output formatting.
-## Features
-- Modular tool/LLM registry (easy to extend)
-- Best-in-class Hugging Face models for LLM, QA, table QA, ASR, image captioning
-- File download/caching and type routing
-- Multi-step reasoning and tool chaining
-- GAIA-compliant output and reasoning trace
-- **Advanced YouTube/Video QA**: Frame extraction, object detection (YOLOv8), image captioning (BLIP), and audio transcription (Whisper)
-- **Robust error handling and logging**: All errors are logged to `gaia_agent.log` and user-friendly messages are returned
-- **Secure code execution**: Python code is run in a subprocess with timeout and resource limits
-- **Automated testing**: Unit and integration tests with pytest
 ## Usage
-### Install dependencies
-```bash
-pip install -r requirements.txt
-# Also install yt-dlp (for YouTube/video QA)
-pip install yt-dlp
-# Download YOLOv8 weights if needed
-python -c "from ultralytics import YOLO; YOLO('yolov8n.pt')"
-```
-### Run the agent
-```python
-from gaia_agent import ModularGAIAAgent
-agent = ModularGAIAAgent()
-results = agent.run(from_api=True)
-for r in results:
-    print(r)
-```
-### Run the Gradio UI
-```bash
-python app.py
-```
-### Run tests
-```bash
-pytest tests/
-```
-### Debugging and Logging
-- All errors and important events are logged to `gaia_agent.log`.
-- Set the agent's debug flag for verbose output (see code).
-### Security
-- Python code is executed in a subprocess with a timeout (default 5s).
-- For extra safety, consider running the agent in a containerized environment.
-## File Structure
-- `gaia_agent.py`: Main agent logic
-- `requirements.txt`: Dependencies
-- `README.md`: This file
-- `app.py`: Gradio UI
-- `tests/`: Automated tests
-- `gaia_agent_files/`: Example/context files
-## Example Screenshot
-![screenshot placeholder](screenshot.png)
-## Notes
-- Requires a Hugging Face token for some models/APIs
-- Designed for easy extension and robust, production use
-- For video QA, ensure `yt-dlp` and YOLOv8 weights are available

 ---
+title: Template Final Assignment
+emoji: 🕵🏻‍♂️
+colorFrom: indigo
+colorTo: indigo
 sdk: gradio
+sdk_version: 5.25.2
 app_file: app.py
 pinned: false
+hf_oauth: true
+# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
+hf_oauth_expiration_minutes: 480
 ---
+# GAIA Benchmark Agent - Modular Multi-Modal Architecture
+This Space is built on the official [agents-course/Final_Assignment_Template](https://huggingface.co/spaces/agents-course/Final_Assignment_Template) base. The architecture strictly preserves the original constants and UI, but replaces the agent logic with a fully modular, multi-modal, GAIA-compliant agent.
+## Key Features
+- **ModularGAIAAgent**: Handles multi-modal, multi-step reasoning, tool use, file handling, and strict GAIA output formatting.
+- **Tool/LLM Registry**: Easily extensible for new tools, models, and modalities.
+- **File Handling**: Supports text, CSV, Excel, JSON, images, audio, and code files, with automatic type detection and routing.
+- **Adaptive Reasoning**: Plans and chains tool/model calls as needed for each question.
+- **GAIA-Compliant Output**: Ensures answers are formatted to GAIA standards.
+- **Trace Logging**: Internal reasoning trace for each answer (for debugging and transparency).
 ## Usage
+- Log in with your Hugging Face account.
+- Click 'Run Evaluation & Submit All Answers' to fetch questions, run the agent, and submit answers for scoring.
+- The UI and constants (such as `DEFAULT_API_URL`) are unchanged from the official template, ensuring full compatibility with the GAIA evaluation system.
+## Customization
+- To extend the agent, add new tools or models to the `TOOL_REGISTRY` and update the logic in `ModularGAIAAgent`.
+- The agent is designed for easy adaptation to new modalities and reasoning strategies.
+---
+**Note:** This implementation is intentionally modular and extensible, but the public interface and constants remain as required by the course template.
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -1,25 +1,368 @@
-#!/usr/bin/env python3
-"""
-🚀 Enhanced GAIA Agent Interface - Full API Integration
-Complete Gradio interface for GAIA benchmark with API connectivity and scoring
-"""
 import os
 import gradio as gr
-import json
-from datetime import datetime
-from gaia_agent import ModularGAIAAgent
 import requests
 import inspect
 import pandas as pd
-agent = ModularGAIAAgent()
 # (Keep Constants as is)
 # --- Constants ---
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
-# --- Advanced Modular Agent Integration ---
 class BasicAgent:
     def __init__(self):
         print("BasicAgent (GAIA Modular Agent) initialized.")
@@ -139,24 +482,32 @@ def run_and_submit_all(profile: gr.OAuthProfile | None):
         results_df = pd.DataFrame(results_log)
         return status_message, results_df
 with gr.Blocks() as demo:
     gr.Markdown("# Basic Agent Evaluation Runner")
     gr.Markdown(
         """
         **Instructions:**
         1.  Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
         2.  Log in to your Hugging Face account using the button below. This uses your HF username for submission.
         3.  Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
         ---
         **Disclaimers:**
         Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
         This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
         """
     )
     gr.LoginButton()
     run_button = gr.Button("Run Evaluation & Submit All Answers")
     status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
     results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
     run_button.click(
         fn=run_and_submit_all,
         outputs=[status_output, results_table]
@@ -164,19 +515,24 @@ with gr.Blocks() as demo:
 if __name__ == "__main__":
     print("\n" + "-"*30 + " App Starting " + "-"*30)
     space_host_startup = os.getenv("SPACE_HOST")
-    space_id_startup = os.getenv("SPACE_ID")
     if space_host_startup:
         print(f"✅ SPACE_HOST found: {space_host_startup}")
         print(f"   Runtime URL should be: https://{space_host_startup}.hf.space")
     else:
         print("ℹ️  SPACE_HOST environment variable not found (running locally?).")
-    if space_id_startup:
         print(f"✅ SPACE_ID found: {space_id_startup}")
         print(f"   Repo URL: https://huggingface.co/spaces/{space_id_startup}")
         print(f"   Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
     else:
         print("ℹ️  SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
     print("-"*(60 + len(" App Starting ")) + "\n")
     print("Launching Gradio Interface for Basic Agent Evaluation...")
-    demo.launch(debug=True, share=False)

 import os
 import gradio as gr
 import requests
 import inspect
 import pandas as pd
+from typing import Any
 # (Keep Constants as is)
 # --- Constants ---
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+# --- Advanced Modular Agent Implementation ---
+import json
+import logging
+import mimetypes
+import openpyxl
+import numpy as np
+from datetime import datetime
+from io import BytesIO
+from PIL import Image
+import subprocess
+import tempfile
+from huggingface_hub import InferenceClient
+import cv2
+import torch
+from bs4 import BeautifulSoup
+logging.basicConfig(filename='gaia_agent.log', level=logging.INFO, format='%(asctime)s %(levelname)s:%(message)s')
+logger = logging.getLogger(__name__)
+HF_TOKEN = os.environ.get("HF_TOKEN", "")
+def llama3_chat(prompt):
+    try:
+        client = InferenceClient(provider="fireworks-ai", api_key=HF_TOKEN)
+        completion = client.chat.completions.create(
+            model="meta-llama/Llama-3.1-8B-Instruct",
+            messages=[{"role": "user", "content": prompt}],
+        )
+        return completion.choices[0].message.content
+    except Exception as e:
+        logging.error(f"llama3_chat error: {e}")
+        return f"LLM error: {e}"
+def mixtral_chat(prompt):
+    try:
+        client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
+        completion = client.chat.completions.create(
+            model="mistralai/Mixtral-8x7B-Instruct-v0.1",
+            messages=[{"role": "user", "content": prompt}],
+        )
+        return completion.choices[0].message.content
+    except Exception as e:
+        logging.error(f"mixtral_chat error: {e}")
+        return f"LLM error: {e}"
+def extractive_qa(question, context):
+    try:
+        client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
+        answer = client.question_answering(
+            question=question,
+            context=context,
+            model="deepset/roberta-base-squad2",
+        )
+        return answer["answer"]
+    except Exception as e:
+        logging.error(f"extractive_qa error: {e}")
+        return f"QA error: {e}"
+def table_qa(query, table):
+    try:
+        client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
+        answer = client.table_question_answering(
+            query=query,
+            table=table,
+            model="google/tapas-large-finetuned-wtq",
+        )
+        return answer["answer"]
+    except Exception as e:
+        logging.error(f"table_qa error: {e}")
+        return f"Table QA error: {e}"
+def asr_transcribe(audio_path):
+    try:
+        import torchaudio
+        from transformers import pipeline
+        asr = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")
+        result = asr(audio_path)
+        return result["text"]
+    except Exception as e:
+        logging.error(f"asr_transcribe error: {e}")
+        return f"ASR error: {e}"
+def image_caption(image_path):
+    try:
+        from transformers import BlipProcessor, BlipForConditionalGeneration
+        from PIL import Image
+        processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
+        model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
+        raw_image = Image.open(image_path).convert('RGB')
+        inputs = processor(raw_image, return_tensors="pt")
+        out = model.generate(**inputs)
+        return processor.decode(out[0], skip_special_tokens=True)
+    except Exception as e:
+        logging.error(f"image_caption error: {e}")
+        return f"Image captioning error: {e}"
+def code_analysis(py_path):
+    try:
+        with open(py_path) as f:
+            code = f.read()
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as tmp:
+            tmp.write(code)
+            tmp_path = tmp.name
+        try:
+            result = subprocess.run([
+                "python3", tmp_path
+            ], capture_output=True, text=True, timeout=5)
+            if result.returncode == 0:
+                output = result.stdout.strip().split('\n')
+                return output[-1] if output else ''
+            else:
+                logging.error(f"code_analysis subprocess error: {result.stderr}")
+                return f"Code error: {result.stderr}"
+        except subprocess.TimeoutExpired:
+            logging.error("code_analysis timeout")
+            return "Code execution timed out"
+        finally:
+            os.remove(tmp_path)
+    except Exception as e:
+        logging.error(f"code_analysis error: {e}")
+        return f"Code analysis error: {e}"
+def youtube_video_qa(youtube_url, question):
+    import subprocess
+    import tempfile
+    import os
+    from transformers import pipeline
+    try:
+        with tempfile.TemporaryDirectory() as tmpdir:
+            # Download video
+            video_path = os.path.join(tmpdir, "video.mp4")
+            cmd = ["yt-dlp", "-f", "mp4", "-o", video_path, youtube_url]
+            subprocess.run(cmd, check=True)
+            # Extract audio for ASR
+            audio_path = os.path.join(tmpdir, "audio.mp3")
+            cmd_audio = ["yt-dlp", "-f", "bestaudio", "--extract-audio", "--audio-format", "mp3", "-o", audio_path, youtube_url]
+            subprocess.run(cmd_audio, check=True)
+            # Transcribe audio
+            asr = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")
+            result = asr(audio_path)
+            transcript = result["text"]
+            # Extract frames for vision QA
+            cap = cv2.VideoCapture(video_path)
+            frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+            fps = int(cap.get(cv2.CAP_PROP_FPS))
+            frames = []
+            for i in range(0, frame_count, max(1, fps*5)):
+                cap.set(cv2.CAP_PROP_POS_FRAMES, i)
+                ret, frame = cap.read()
+                if not ret:
+                    break
+                img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
+                frames.append(img)
+            cap.release()
+            # Object detection (YOLOv8)
+            try:
+                from ultralytics import YOLO
+                yolo = YOLO("yolov8n.pt")
+                detections = []
+                for img in frames:
+                    results = yolo(np.array(img))
+                    for r in results:
+                        for c in r.boxes.cls:
+                            detections.append(yolo.model.names[int(c)])
+                detection_summary = {}
+                for obj in detections:
+                    detection_summary[obj] = detection_summary.get(obj, 0) + 1
+            except Exception as e:
+                logging.error(f"YOLOv8 error: {e}")
+                detection_summary = {}
+            # Image captioning (BLIP)
+            try:
+                from transformers import BlipProcessor, BlipForConditionalGeneration
+                processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
+                model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
+                captions = []
+                for img in frames:
+                    inputs = processor(img, return_tensors="pt")
+                    out = model.generate(**inputs)
+                    captions.append(processor.decode(out[0], skip_special_tokens=True))
+            except Exception as e:
+                logging.error(f"BLIP error: {e}")
+                captions = []
+            context = f"Transcript: {transcript}\nCaptions: {' | '.join(captions)}\nDetections: {detection_summary}"
+            answer = extractive_qa(question, context)
+            return answer
+    except Exception as e:
+        logging.error(f"YouTube video QA error: {e}")
+        return f"Video analysis error: {e}"
+TOOL_REGISTRY = {
+    "llama3_chat": llama3_chat,
+    "mixtral_chat": mixtral_chat,
+    "extractive_qa": extractive_qa,
+    "table_qa": table_qa,
+    "asr_transcribe": asr_transcribe,
+    "image_caption": image_caption,
+    "code_analysis": code_analysis,
+    "youtube_video_qa": youtube_video_qa,
+}
+class ModularGAIAAgent:
+    def __init__(self, api_url=DEFAULT_API_URL, tool_registry=TOOL_REGISTRY):
+        self.api_url = api_url
+        self.tools = tool_registry
+        self.reasoning_trace = []
+        self.file_cache = set(os.listdir('.'))
+    def fetch_questions(self, from_api=True, questions_path="Hugging Face Questions"):
+        if from_api:
+            r = requests.get(f"{self.api_url}/questions")
+            r.raise_for_status()
+            return r.json()
+        else:
+            with open(questions_path) as f:
+                data = f.read()
+            start = data.find("[")
+            end = data.rfind("]") + 1
+            questions = json.loads(data[start:end])
+            return questions
+    def download_file(self, file_id, file_name=None):
+        if not file_name:
+            file_name = file_id
+        if file_name in self.file_cache:
+            return file_name
+        url = f"{self.api_url}/files/{file_id}"
+        r = requests.get(url)
+        if r.status_code == 200:
+            with open(file_name, "wb") as f:
+                f.write(r.content)
+            self.file_cache.add(file_name)
+            return file_name
+        else:
+            self.reasoning_trace.append(f"Failed to download file {file_id} (status {r.status_code})")
+            return None
+    def detect_file_type(self, file_name):
+        ext = os.path.splitext(file_name)[-1].lower()
+        if ext in ['.mp3', '.wav', '.flac']:
+            return 'audio'
+        elif ext in ['.png', '.jpg', '.jpeg', '.bmp']:
+            return 'image'
+        elif ext in ['.py']:
+            return 'code'
+        elif ext in ['.xlsx']:
+            return 'excel'
+        elif ext in ['.csv']:
+            return 'csv'
+        elif ext in ['.json']:
+            return 'json'
+        elif ext in ['.txt', '.md']:
+            return 'text'
+        else:
+            return 'unknown'
+    def analyze_file(self, file_name, file_type):
+        if file_type == 'audio':
+            transcript = self.tools['asr_transcribe'](file_name)
+            self.reasoning_trace.append(f"Transcribed audio: {transcript[:100]}...")
+            return transcript
+        elif file_type == 'image':
+            caption = self.tools['image_caption'](file_name)
+            self.reasoning_trace.append(f"Image caption: {caption}")
+            return caption
+        elif file_type == 'code':
+            result = self.tools['code_analysis'](file_name)
+            self.reasoning_trace.append(f"Code analysis result: {result}")
+            return result
+        elif file_type == 'excel':
+            wb = openpyxl.load_workbook(file_name)
+            ws = wb.active
+            data = list(ws.values)
+            headers = data[0]
+            table = [dict(zip(headers, row)) for row in data[1:]]
+            self.reasoning_trace.append(f"Excel table loaded: {table[:2]}...")
+            return table
+        elif file_type == 'csv':
+            df = pd.read_csv(file_name)
+            table = df.to_dict(orient='records')
+            self.reasoning_trace.append(f"CSV table loaded: {table[:2]}...")
+            return table
+        elif file_type == 'json':
+            with open(file_name) as f:
+                data = json.load(f)
+            self.reasoning_trace.append(f"JSON loaded: {str(data)[:100]}...")
+            return data
+        elif file_type == 'text':
+            with open(file_name) as f:
+                text = f.read()
+            self.reasoning_trace.append(f"Text loaded: {text[:100]}...")
+            return text
+        else:
+            self.reasoning_trace.append(f"Unknown file type: {file_name}")
+            return None
+    def answer_question(self, question_obj):
+        self.reasoning_trace = []
+        q = question_obj["question"]
+        file_name = question_obj.get("file_name", "")
+        file_content = None
+        file_type = None
+        # YouTube video question detection
+        if "youtube.com" in q or "youtu.be" in q:
+            url = None
+            for word in q.split():
+                if "youtube.com" in word or "youtu.be" in word:
+                    url = word.strip().strip(',')
+                    break
+            if url:
+                answer = self.tools['youtube_video_qa'](url, q)
+                self.reasoning_trace.append(f"YouTube video analyzed: {url}")
+                self.reasoning_trace.append(f"Final answer: {answer}")
+                return self.format_answer(answer), self.reasoning_trace
+        if file_name:
+            file_id = file_name.split('.')[0]
+            local_file = self.download_file(file_id, file_name)
+            if local_file:
+                file_type = self.detect_file_type(local_file)
+                file_content = self.analyze_file(local_file, file_type)
+        # Plan: choose tool based on question and file
+        if file_type == 'audio' or file_type == 'text':
+            if file_content:
+                answer = self.tools['extractive_qa'](q, file_content)
+            else:
+                answer = self.tools['llama3_chat'](q)
+        elif file_type == 'excel' or file_type == 'csv':
+            if file_content:
+                answer = self.tools['table_qa'](q, file_content)
+            else:
+                answer = self.tools['llama3_chat'](q)
+        elif file_type == 'image':
+            if file_content:
+                answer = self.tools['llama3_chat'](f"{q}\nImage description: {file_content}")
+            else:
+                answer = self.tools['llama3_chat'](q)
+        elif file_type == 'code':
+            answer = file_content
+        else:
+            answer = self.tools['llama3_chat'](q)
+        self.reasoning_trace.append(f"Final answer: {answer}")
+        return self.format_answer(answer), self.reasoning_trace
+    def format_answer(self, answer):
+        if isinstance(answer, str):
+            answer = answer.strip().rstrip('.')
+            for prefix in ['answer:', 'result:', 'the answer is', 'final answer:', 'response:']:
+                if answer.lower().startswith(prefix):
+                    answer = answer[len(prefix):].strip()
+            import re
+            answer = re.sub(r'\b(the|a|an)\b ', '', answer, flags=re.IGNORECASE)
+            answer = answer.strip().rstrip('.')
+        return answer
+# --- Basic Agent Definition (now wraps ModularGAIAAgent) ---
 class BasicAgent:
     def __init__(self):
         print("BasicAgent (GAIA Modular Agent) initialized.")
         results_df = pd.DataFrame(results_log)
         return status_message, results_df
+# --- Build Gradio Interface using Blocks ---
 with gr.Blocks() as demo:
     gr.Markdown("# Basic Agent Evaluation Runner")
     gr.Markdown(
         """
         **Instructions:**
         1.  Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
         2.  Log in to your Hugging Face account using the button below. This uses your HF username for submission.
         3.  Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
         ---
         **Disclaimers:**
         Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
         This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
         """
     )
     gr.LoginButton()
     run_button = gr.Button("Run Evaluation & Submit All Answers")
     status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
+    # Removed max_rows=10 from DataFrame constructor
     results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
     run_button.click(
         fn=run_and_submit_all,
         outputs=[status_output, results_table]
 if __name__ == "__main__":
     print("\n" + "-"*30 + " App Starting " + "-"*30)
+    # Check for SPACE_HOST and SPACE_ID at startup for information
     space_host_startup = os.getenv("SPACE_HOST")
+    space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
     if space_host_startup:
         print(f"✅ SPACE_HOST found: {space_host_startup}")
         print(f"   Runtime URL should be: https://{space_host_startup}.hf.space")
     else:
         print("ℹ️  SPACE_HOST environment variable not found (running locally?).")
+    if space_id_startup: # Print repo URLs if SPACE_ID is found
         print(f"✅ SPACE_ID found: {space_id_startup}")
         print(f"   Repo URL: https://huggingface.co/spaces/{space_id_startup}")
         print(f"   Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
     else:
         print("ℹ️  SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
     print("-"*(60 + len(" App Starting ")) + "\n")
     print("Launching Gradio Interface for Basic Agent Evaluation...")
+    demo.launch(debug=True, share=False)

gaia_agent.py DELETED Viewed

@@ -1,397 +0,0 @@
-#!/usr/bin/env python3
-"""
-🚀 Enhanced GAIA Agent - Full GAIA Benchmark Implementation
-Optimized for 30%+ performance on GAIA benchmark with complete API integration
-"""
-import os
-import re
-import json
-import base64
-import logging
-import requests
-from typing import Dict, List, Any, Optional, Tuple
-from urllib.parse import urlparse, quote
-from io import BytesIO
-import pandas as pd
-import numpy as np
-from datetime import datetime
-from bs4 import BeautifulSoup
-# import markdownify  # Removed for compatibility
-from huggingface_hub import InferenceClient
-import mimetypes
-import openpyxl
-import cv2
-import torch
-from PIL import Image
-import subprocess
-import tempfile
-# Configure logging
-logging.basicConfig(filename='gaia_agent.log', level=logging.INFO, format='%(asctime)s %(levelname)s:%(message)s')
-logger = logging.getLogger(__name__)
-DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
-HF_TOKEN = os.environ.get("HF_TOKEN", "")
-# --- Tool/LLM Wrappers ---
-def llama3_chat(prompt):
-    try:
-        client = InferenceClient(provider="fireworks-ai", api_key=HF_TOKEN)
-        completion = client.chat.completions.create(
-            model="meta-llama/Llama-3.1-8B-Instruct",
-            messages=[{"role": "user", "content": prompt}],
-        )
-        return completion.choices[0].message.content
-    except Exception as e:
-        logging.error(f"llama3_chat error: {e}")
-        return f"LLM error: {e}"
-def mixtral_chat(prompt):
-    try:
-        client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
-        completion = client.chat.completions.create(
-            model="mistralai/Mixtral-8x7B-Instruct-v0.1",
-            messages=[{"role": "user", "content": prompt}],
-        )
-        return completion.choices[0].message.content
-    except Exception as e:
-        logging.error(f"mixtral_chat error: {e}")
-        return f"LLM error: {e}"
-def extractive_qa(question, context):
-    try:
-        client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
-        answer = client.question_answering(
-            question=question,
-            context=context,
-            model="deepset/roberta-base-squad2",
-        )
-        return answer["answer"]
-    except Exception as e:
-        logging.error(f"extractive_qa error: {e}")
-        return f"QA error: {e}"
-def table_qa(query, table):
-    try:
-        client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
-        answer = client.table_question_answering(
-            query=query,
-            table=table,
-            model="google/tapas-large-finetuned-wtq",
-        )
-        return answer["answer"]
-    except Exception as e:
-        logging.error(f"table_qa error: {e}")
-        return f"Table QA error: {e}"
-def asr_transcribe(audio_path):
-    try:
-        import torchaudio
-        from transformers import pipeline
-        asr = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")
-        result = asr(audio_path)
-        return result["text"]
-    except Exception as e:
-        logging.error(f"asr_transcribe error: {e}")
-        return f"ASR error: {e}"
-def image_caption(image_path):
-    try:
-        from transformers import BlipProcessor, BlipForConditionalGeneration
-        from PIL import Image
-        processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
-        model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
-        raw_image = Image.open(image_path).convert('RGB')
-        inputs = processor(raw_image, return_tensors="pt")
-        out = model.generate(**inputs)
-        return processor.decode(out[0], skip_special_tokens=True)
-    except Exception as e:
-        logging.error(f"image_caption error: {e}")
-        return f"Image captioning error: {e}"
-def code_analysis(py_path):
-    try:
-        # Hardened: run code in subprocess with timeout and memory limit
-        with open(py_path) as f:
-            code = f.read()
-        with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as tmp:
-            tmp.write(code)
-            tmp_path = tmp.name
-        try:
-            result = subprocess.run([
-                "python3", tmp_path
-            ], capture_output=True, text=True, timeout=5)
-            if result.returncode == 0:
-                output = result.stdout.strip().split('\n')
-                return output[-1] if output else ''
-            else:
-                logging.error(f"code_analysis subprocess error: {result.stderr}")
-                return f"Code error: {result.stderr}"
-        except subprocess.TimeoutExpired:
-            logging.error("code_analysis timeout")
-            return "Code execution timed out"
-        finally:
-            os.remove(tmp_path)
-    except Exception as e:
-        logging.error(f"code_analysis error: {e}")
-        return f"Code analysis error: {e}"
-def youtube_video_qa(youtube_url, question):
-    import subprocess
-    import tempfile
-    import os
-    from transformers import pipeline
-    try:
-        with tempfile.TemporaryDirectory() as tmpdir:
-            # Download video
-            video_path = os.path.join(tmpdir, "video.mp4")
-            cmd = ["yt-dlp", "-f", "mp4", "-o", video_path, youtube_url]
-            subprocess.run(cmd, check=True)
-            # Extract audio for ASR
-            audio_path = os.path.join(tmpdir, "audio.mp3")
-            cmd_audio = ["yt-dlp", "-f", "bestaudio", "--extract-audio", "--audio-format", "mp3", "-o", audio_path, youtube_url]
-            subprocess.run(cmd_audio, check=True)
-            # Transcribe audio
-            asr = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")
-            result = asr(audio_path)
-            transcript = result["text"]
-            # Extract frames for vision QA
-            cap = cv2.VideoCapture(video_path)
-            frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
-            fps = int(cap.get(cv2.CAP_PROP_FPS))
-            frames = []
-            for i in range(0, frame_count, max(1, fps*5)):
-                cap.set(cv2.CAP_PROP_POS_FRAMES, i)
-                ret, frame = cap.read()
-                if not ret:
-                    break
-                img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
-                frames.append(img)
-            cap.release()
-            # Object detection (YOLOv8)
-            try:
-                from ultralytics import YOLO
-                yolo = YOLO("yolov8n.pt")
-                detections = []
-                for img in frames:
-                    results = yolo(np.array(img))
-                    for r in results:
-                        for c in r.boxes.cls:
-                            detections.append(yolo.model.names[int(c)])
-                detection_summary = {}
-                for obj in detections:
-                    detection_summary[obj] = detection_summary.get(obj, 0) + 1
-            except Exception as e:
-                logging.error(f"YOLOv8 error: {e}")
-                detection_summary = {}
-            # Image captioning (BLIP)
-            try:
-                from transformers import BlipProcessor, BlipForConditionalGeneration
-                processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
-                model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
-                captions = []
-                for img in frames:
-                    inputs = processor(img, return_tensors="pt")
-                    out = model.generate(**inputs)
-                    captions.append(processor.decode(out[0], skip_special_tokens=True))
-            except Exception as e:
-                logging.error(f"BLIP error: {e}")
-                captions = []
-            # Aggregate and answer
-            context = f"Transcript: {transcript}\nCaptions: {' | '.join(captions)}\nDetections: {detection_summary}"
-            answer = extractive_qa(question, context)
-            return answer
-    except Exception as e:
-        logging.error(f"YouTube video QA error: {e}")
-        return f"Video analysis error: {e}"
-# --- Tool Registry ---
-TOOL_REGISTRY = {
-    "llama3_chat": llama3_chat,
-    "mixtral_chat": mixtral_chat,
-    "extractive_qa": extractive_qa,
-    "table_qa": table_qa,
-    "asr_transcribe": asr_transcribe,
-    "image_caption": image_caption,
-    "code_analysis": code_analysis,
-    "youtube_video_qa": youtube_video_qa,
-}
-class ModularGAIAAgent:
-    """
-    Modular GAIA Agent: fetches questions from API, downloads files, routes to tools/LLMs, chains outputs, and formats GAIA-compliant answers.
-    """
-    def __init__(self, api_url=DEFAULT_API_URL, tool_registry=TOOL_REGISTRY):
-        self.api_url = api_url
-        self.tools = tool_registry
-        self.reasoning_trace = []
-        self.file_cache = set(os.listdir('.'))
-    def fetch_questions(self, from_api=True, questions_path="Hugging Face Questions") -> List[Dict[str, Any]]:
-        if from_api:
-            r = requests.get(f"{self.api_url}/questions")
-            r.raise_for_status()
-            return r.json()
-        else:
-            with open(questions_path) as f:
-                data = f.read()
-            start = data.find("[")
-            end = data.rfind("]") + 1
-            questions = json.loads(data[start:end])
-            return questions
-    def download_file(self, file_id, file_name=None):
-        if not file_name:
-            file_name = file_id
-        if file_name in self.file_cache:
-            return file_name
-        url = f"{self.api_url}/files/{file_id}"
-        r = requests.get(url)
-        if r.status_code == 200:
-            with open(file_name, "wb") as f:
-                f.write(r.content)
-            self.file_cache.add(file_name)
-            return file_name
-        else:
-            self.reasoning_trace.append(f"Failed to download file {file_id} (status {r.status_code})")
-            return None
-    def detect_file_type(self, file_name):
-        ext = os.path.splitext(file_name)[-1].lower()
-        if ext in ['.mp3', '.wav', '.flac']:
-            return 'audio'
-        elif ext in ['.png', '.jpg', '.jpeg', '.bmp']:
-            return 'image'
-        elif ext in ['.py']:
-            return 'code'
-        elif ext in ['.xlsx']:
-            return 'excel'
-        elif ext in ['.csv']:
-            return 'csv'
-        elif ext in ['.json']:
-            return 'json'
-        elif ext in ['.txt', '.md']:
-            return 'text'
-        else:
-            return 'unknown'
-    def analyze_file(self, file_name, file_type):
-        if file_type == 'audio':
-            transcript = self.tools['asr_transcribe'](file_name)
-            self.reasoning_trace.append(f"Transcribed audio: {transcript[:100]}...")
-            return transcript
-        elif file_type == 'image':
-            caption = self.tools['image_caption'](file_name)
-            self.reasoning_trace.append(f"Image caption: {caption}")
-            return caption
-        elif file_type == 'code':
-            result = self.tools['code_analysis'](file_name)
-            self.reasoning_trace.append(f"Code analysis result: {result}")
-            return result
-        elif file_type == 'excel':
-            wb = openpyxl.load_workbook(file_name)
-            ws = wb.active
-            data = list(ws.values)
-            headers = data[0]
-            table = [dict(zip(headers, row)) for row in data[1:]]
-            self.reasoning_trace.append(f"Excel table loaded: {table[:2]}...")
-            return table
-        elif file_type == 'csv':
-            df = pd.read_csv(file_name)
-            table = df.to_dict(orient='records')
-            self.reasoning_trace.append(f"CSV table loaded: {table[:2]}...")
-            return table
-        elif file_type == 'json':
-            with open(file_name) as f:
-                data = json.load(f)
-            self.reasoning_trace.append(f"JSON loaded: {str(data)[:100]}...")
-            return data
-        elif file_type == 'text':
-            with open(file_name) as f:
-                text = f.read()
-            self.reasoning_trace.append(f"Text loaded: {text[:100]}...")
-            return text
-        else:
-            self.reasoning_trace.append(f"Unknown file type: {file_name}")
-            return None
-    def answer_question(self, question_obj):
-        self.reasoning_trace = []
-        q = question_obj["question"]
-        file_name = question_obj.get("file_name", "")
-        file_content = None
-        file_type = None
-        # YouTube video question detection
-        if "youtube.com" in q or "youtu.be" in q:
-            url = None
-            for word in q.split():
-                if "youtube.com" in word or "youtu.be" in word:
-                    url = word.strip().strip(',')
-                    break
-            if url:
-                answer = self.tools['youtube_video_qa'](url, q)
-                self.reasoning_trace.append(f"YouTube video analyzed: {url}")
-                self.reasoning_trace.append(f"Final answer: {answer}")
-                return self.format_answer(answer), self.reasoning_trace
-        if file_name:
-            file_id = file_name.split('.')[0]
-            local_file = self.download_file(file_id, file_name)
-            if local_file:
-                file_type = self.detect_file_type(local_file)
-                file_content = self.analyze_file(local_file, file_type)
-        # Plan: choose tool based on question and file
-        if file_type == 'audio' or file_type == 'text':
-            if file_content:
-                answer = self.tools['extractive_qa'](q, file_content)
-            else:
-                answer = self.tools['llama3_chat'](q)
-        elif file_type == 'excel' or file_type == 'csv':
-            if file_content:
-                answer = self.tools['table_qa'](q, file_content)
-            else:
-                answer = self.tools['llama3_chat'](q)
-        elif file_type == 'image':
-            if file_content:
-                answer = self.tools['llama3_chat'](f"{q}\nImage description: {file_content}")
-            else:
-                answer = self.tools['llama3_chat'](q)
-        elif file_type == 'code':
-            answer = file_content
-        else:
-            answer = self.tools['llama3_chat'](q)
-        self.reasoning_trace.append(f"Final answer: {answer}")
-        return self.format_answer(answer), self.reasoning_trace
-    def format_answer(self, answer):
-        # GAIA compliance: remove extra words, units, articles, etc.
-        if isinstance(answer, str):
-            answer = answer.strip().rstrip('.')
-            # Remove common prefixes
-            for prefix in ['answer:', 'result:', 'the answer is', 'final answer:', 'response:']:
-                if answer.lower().startswith(prefix):
-                    answer = answer[len(prefix):].strip()
-            # Remove articles
-            import re
-            answer = re.sub(r'\b(the|a|an)\b ', '', answer, flags=re.IGNORECASE)
-            # Remove trailing punctuation
-            answer = answer.strip().rstrip('.')
-        return answer
-    def run(self, from_api=True, questions_path="Hugging Face Questions"):
-        questions = self.fetch_questions(from_api=from_api, questions_path=questions_path)
-        results = []
-        for qobj in questions:
-            answer, trace = self.answer_question(qobj)
-            results.append({
-                "task_id": qobj["task_id"],
-                "answer": answer,
-                "reasoning_trace": trace
-            })
-        return results
-# --- Usage Example ---
-# agent = ModularGAIAAgent()
-# results = agent.run()
-# for r in results:
-#     print(r)

requirements.txt CHANGED Viewed

@@ -1,19 +1,13 @@
-# Enhanced GAIA Agent Requirements - Essential Functionality
-gradio>=5.0.0
-pandas==2.1.0
-numpy==1.25.2
-requests==2.31.0
-urllib3==2.0.4
-python-dateutil==2.8.2
-regex==2023.10.3
-beautifulsoup4==4.12.2
-pillow==10.0.1
 transformers
 huggingface_hub
-openpyxl
-torchaudio
-Pillow
 opencv-python
-torch
-ultralytics
-pytest

+gradio
+requests
+pandas
+numpy
+openpyxl
+pillow
+torch
 transformers
 huggingface_hub
 opencv-python
+beautifulsoup4
+yt-dlp
+ultralytics

tests/test_agent_core.py DELETED Viewed

@@ -1,38 +0,0 @@
-import pytest
-from gaia_agent import ModularGAIAAgent
-import os
-@pytest.fixture
-def agent():
-    return ModularGAIAAgent()
-def test_tool_registry(agent):
-    assert 'llama3_chat' in agent.tools
-    assert 'extractive_qa' in agent.tools
-    assert 'youtube_video_qa' in agent.tools
-def test_fetch_questions_api(monkeypatch, agent):
-    class MockResponse:
-        def json(self):
-            return [{"task_id": "1", "question": "What is 2+2?", "file_name": ""}]
-        def raise_for_status(self):
-            pass
-    monkeypatch.setattr("requests.get", lambda url: MockResponse())
-    questions = agent.fetch_questions(from_api=True)
-    assert isinstance(questions, list)
-    assert questions[0]["question"] == "What is 2+2?"
-def test_download_file(monkeypatch, agent, tmp_path):
-    test_file = tmp_path / "test.txt"
-    monkeypatch.setattr("requests.get", lambda url: type("R", (), {"status_code": 200, "content": b"hello"})())
-    fname = agent.download_file("testid", str(test_file))
-    assert os.path.exists(fname)
-    with open(fname) as f:
-        assert f.read() == "hello"
-def test_end_to_end(monkeypatch, agent):
-    # Mock API and tools for a simple run
-    monkeypatch.setattr(agent, "fetch_questions", lambda from_api, questions_path=None: [{"task_id": "1", "question": "What is 2+2?", "file_name": ""}])
-    agent.tools['llama3_chat'] = lambda prompt: "4"
-    results = agent.run(from_api=True)
-    assert results[0]["answer"] == "4"

tests/test_video_qa.py DELETED Viewed

@@ -1,22 +0,0 @@
-import pytest
-from gaia_agent import ModularGAIAAgent
-@pytest.fixture
-def agent():
-    return ModularGAIAAgent()
-def test_youtube_video_qa(monkeypatch, agent):
-    # Mock subprocess, ASR, YOLO, BLIP, and extractive_qa
-    monkeypatch.setattr("subprocess.run", lambda *a, **k: None)
-    monkeypatch.setattr("cv2.VideoCapture", lambda *a, **k: type("C", (), {
-        "get": lambda self, x: 10 if x == 7 else 1,  # 10 frames, 1 fps
-        "set": lambda self, x, y: None,
-        "read": lambda self: (True, __import__('numpy').zeros((10,10,3), dtype='uint8')),
-        "release": lambda self: None
-    })())
-    monkeypatch.setattr("PIL.Image.fromarray", lambda arr: arr)
-    agent.tools['extractive_qa'] = lambda q, c: "bird species: 5"
-    # Simulate a YouTube question
-    qobj = {"task_id": "yt1", "question": "In the video https://youtube.com/watch?v=abc123, what is the highest number of bird species to be on camera simultaneously?", "file_name": ""}
-    answer, trace = agent.answer_question(qobj)
-    assert "bird species" in answer