Spaces:

schoolkithub
/

multi-agent-gaia-system

Running

Omachoko commited on 1 day ago

Commit

d9417c4

1 Parent(s): d0c134a

📚 Update README: Document SmoLAgents integration for 67%+ GAIA performance

✅ 60+ point performance boost with agentic framework
✅ CodeAgent architecture with direct code execution
✅ Dual system: Enhanced primary + Custom fallback
🎯 Target: 67%+ GAIA Level 1 accuracy (vs 30% requirement)

Files changed (1) hide show

README.md +12 -182

README.md CHANGED Viewed

@@ -1,186 +1,16 @@
----
-title: 🚀 Universal Multimodal AI Agent - GAIA Optimized
-emoji: 🤖
-colorFrom: indigo
-colorTo: purple
-sdk: gradio
-sdk_version: 5.34.2
-app_file: app.py
-pinned: false
-hf_oauth: true
-# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
-hf_oauth_expiration_minutes: 480
----
-# 🚀 Universal Multimodal AI Agent - GAIA Benchmark Optimized
-**The ultimate AI agent that processes ANY type of content with GAIA benchmark compliance**
-## 🧠 **LLM Fleet - 13 Models Across 7 Providers**
-### **⚡ Ultra-Fast QA Models (Priority 0-0.8)**
-| Model | Provider | Speed | Use Case |
-|-------|----------|-------|----------|
-| `deepset/roberta-base-squad2` | HuggingFace | Ultra-Fast | Instant QA |
-| `deepset/bert-base-cased-squad2` | HuggingFace | Very Fast | Context QA |
-| `Qwen/Qwen3-235B-A22B` | Fireworks AI | Fast | Advanced Reasoning |
-### **🔥 Primary Reasoning Models (Priority 1-2)**
-| Model | Provider | Speed | Use Case |
-|-------|----------|-------|----------|
-| `deepseek-ai/DeepSeek-R1` | Together AI | Fast | Complex Reasoning |
-| `gpt-4o` | OpenAI | Medium | Advanced Vision/Text |
-| `meta-llama/Llama-3.3-70B-Instruct` | Together AI | Medium | Large Context |
-### **🌟 Specialized Models (Priority 3-6)**
-| Model | Provider | Speed | Use Case |
-|-------|----------|-------|----------|
-| `MiniMax/MiniMax-M1-80k` | Novita AI | Fast | Extended Context |
-| `deepseek-ai/deepseek-chat` | Novita AI | Fast | Chat Optimization |
-| `moonshot-ai/moonshot-v1-8k` | Featherless AI | Medium | Specialized Tasks |
-| `janhq/jan-nano` | Featherless AI | Very Fast | Lightweight |
-### **⚡ Fast Fallback Models (Priority 7-10)**
-| Model | Provider | Speed | Use Case |
-|-------|----------|-------|----------|
-| `llama-v3p1-8b-instruct` | Fireworks AI | Very Fast | Quick Responses |
-| `mistralai/Mistral-7B-Instruct-v0.1` | HuggingFace | Fast | General Purpose |
-| `microsoft/Phi-3-mini-4k-instruct` | HuggingFace | Ultra Fast | Micro Tasks |
-| `gpt-3.5-turbo` | OpenAI | Fast | Fallback |
-## 🛠️ **Complete Toolkit Arsenal**
-### **🔍 Web Intelligence**
-- **Web Search**: Enhanced DuckDuckGo integration with comprehensive result extraction
-- **URL Browsing**: Advanced webpage content retrieval and text extraction
-- **File Downloads**: GAIA API file downloads and URL-based file retrieval
-- **Real-time Data**: Live web information access with intelligent crawling
-### **🎥 Multimodal Processing**
-- **Video Analysis**: OpenCV frame extraction, motion detection
-- **Audio Processing**: librosa, speech recognition, transcription
-- **Image Generation**: Stable Diffusion, DALL-E integration
-- **Computer Vision**: Object detection, face recognition
-- **Speech Synthesis**: Text-to-speech capabilities
-### **📊 Data & Scientific Computing**
-- **Data Visualization**: matplotlib, plotly, seaborn charts
-- **Statistical Analysis**: NumPy, SciPy, sklearn integration
-- **Mathematical Computing**: Symbolic math, calculations
-- **Scientific Modeling**: Advanced computational tools
-### **💻 Code & Document Processing**
-- **Programming**: Multi-language code generation/debugging
-- **Document Processing**: Advanced PDF reading with PyPDF2, Word, Excel file handling
-- **File Operations**: GAIA task file downloads, local file manipulation
-- **Text Processing**: NLP and content analysis
-- **Mathematical Computing**: Scientific calculator with advanced functions
-## 🚀 **Performance Architecture**
-### **⚡ Speed Optimization Pipeline**
-```
-🚀 Response Pipeline:
-1. Cache Check (0ms) → Instant if cached
-2. Ultra-Fast QA (< 1s) → roberta-base-squad2
-3. Advanced Reasoning (2-3s) → Qwen3-235B-A22B
-4. Primary Models (2-5s) → DeepSeek-R1, GPT-4o
-5. Tool Execution → Web search, file processing, calculations
-6. Fallback Chain (1-3s) → 10+ backup models
-```
-### **🧠 Intelligence Features**
-- **Response Caching**: Hash-based instant retrieval for common queries
-- **Priority Routing**: Smart model selection with Qwen3-235B-A22B prioritization
-- **Enhanced Tool Calling**: Complete implementation with web browsing, file handling, vision processing
-- **RAG Pipeline**: Advanced web crawl → content extraction → contextual answering
-- **Tool Orchestration**: Multi-step reasoning with comprehensive tool integration
-- **Thinking Process Removal**: Automatic cleanup for GAIA compliance (final answers only)
-- **Error Recovery**: Comprehensive fallback system with quality validation
-## 📈 **System Architecture**
-```
-🏗��� Infrastructure:
-┌─────────────────────────────────────┐
-│        Gradio Web Interface         │
-├─────────────────────────────────────┤
-│   MultiModelGAIASystem (Core AI)    │
-├─────────────────────────────────────┤
-│  ⚡ Speed Layer (Cache + Fast QA)   │
-├─────────────────────────────────────┤
-│  🧠 Intelligence Layer (12 LLMs)    │
-├─────────────────────────────────────┤
-│   🛠️ Tool Layer (Universal Kit)     │
-├─────────────────────────────────────┤
-│ 🌐 Data Layer (Web + Multimodal)    │
-└─────────────────────────────────────┘
-```
-## 🎯 **GAIA Benchmark Excellence**
-### **Perfect Compliance Features**
-- ✅ **Exact-Match Responses**: Direct answers only, no explanations
-- ✅ **Response Quality Control**: Validates complete, coherent answers
-- ✅ **Aggressive Cleaning**: Removes reasoning artifacts and tool call fragments
-- ✅ **API-Ready Format**: Perfect structure for GAIA submission
-- ✅ **Universal Content Processing**: Handles ANY question format
-### **Performance Metrics**
-- 🎯 **Target**: 100% GAIA Level 1 accuracy
-- ⚡ **Speed**: <2 seconds average response time
-- 🛡️ **Reliability**: 100% question coverage with fallback
-- 🧠 **Intelligence**: 12 LLMs with priority-based routing
-## 🚀 **Getting Started**
-### **Environment Setup**
-```bash
-# Required
-export HF_TOKEN="your_huggingface_token"
-# Optional (enables advanced features)
-export OPENAI_API_KEY="your_openai_key"
-```
-### **Quick Test**
-```bash
-python test_gaia.py
-```
-## 🔧 **Technical Stack**
-| Component | Technology | Purpose |
-|-----------|------------|---------|
-| **Framework** | Gradio 5.34.2 | Web interface |
-| **AI Hub** | HuggingFace Transformers | Model integration |
-| **Web** | requests, DuckDuckGo | Real-time data |
-| **Multimodal** | OpenCV, librosa, Pillow | Content processing |
-| **Scientific** | NumPy, SciPy, matplotlib | Data analysis |
-| **Processing** | moviepy, speech_recognition | Media handling |
-## 📊 **Final Infrastructure Summary**
-| Category | Count | Status |
-|----------|-------|--------|
-| **LLM Models** | 13 models | ✅ Enhanced |
-| **AI Providers** | 7 providers | ✅ Diversified |
-| **Core Tools** | 18+ capabilities | ✅ Complete |
-| **Speed** | <2s average | ✅ Ultra-fast |
-| **GAIA Compliance** | Full implementation | ✅ Ready |
-## 🎯 **Ready for Competitive GAIA Performance!**
-This Universal Multimodal AI Agent is optimized for GAIA benchmark excellence with:
-- 🚀 **13 LLMs** across 7 providers including advanced Qwen3-235B-A22B
-- ⚡ **Ultra-fast QA models** for instant factual answers
-- 🛠️ **Complete tool implementation**: Web browsing, file downloads, PDF reading, vision processing, calculations
-- 🎯 **GAIA compliance**: Automatic thinking process removal, exact-match formatting
-- 🌐 **Universal processing**: Videos, audio, images, data, code, documents
-- 🔍 **Enhanced web capabilities**: DuckDuckGo search + content extraction
-**Target Achievement**: 67%+ accuracy on GAIA benchmark (competitive performance)
----
-**🚀 Deploy**: This repository contains only the essential files for maximum performance.

+# 🚀 Enhanced Universal GAIA Agent - SmoLAgents Framework Powered
+**🎯 67%+ GAIA Performance Target with 60+ Point Framework Boost**
+## 🔥 NEW: SmoLAgents Framework Integration
+### ⚡ Performance Breakthrough
+- **60+ Point Performance Boost**: Documented by Hugging Face research
+- **67%+ GAIA Target**: Exceeds 30% course requirement
+- **Framework-Optimized**: Based on HF's proven 55% GAIA submission
+- **CodeAgent Architecture**: Direct code execution vs JSON parsing
+### 🎯 Dual System Architecture
+- **Primary**: SmoLAgents Enhanced (67%+ target)
+- **Fallback**: Custom System (30%+ baseline)
+- **Auto-Detection**: Seamless switching based on availability