Multi-Agent Video Generation System - Architecture Overview
๐ฏ System Purpose
This is a sophisticated multi-agent system that automatically generates educational videos using Manim (Mathematical Animation Engine). The system transforms textual descriptions of mathematical concepts, theorems, and educational content into high-quality animated videos through coordinated AI agents.
๐๏ธ System Architecture
flowchart TD
%% Input Layer
U["User Input<br/>(Topic & Context)"]:::input
GV["generate_video.py<br/>(Main Orchestrator)"]:::input
ES["evaluate.py<br/>(Quality Assessment)"]:::input
%% Configuration and Data
CONF["Configuration<br/>(.env, src/config)"]:::config
DATA["Data Repository<br/>(data/)"]:::data
%% Core Generation Pipeline
subgraph "Core Multi-Agent Pipeline"
CG["Code Generation Agent<br/>(src/core/code_generator.py)"]:::core
VP["Video Planning Agent<br/>(src/core/video_planner.py)"]:::core
VR["Video Rendering Agent<br/>(src/core/video_renderer.py)"]:::core
end
%% Retrieval & Augmentation (RAG)
RAG["RAG Intelligence Agent<br/>(src/rag/rag_integration.py,<br/>src/rag/vector_store.py)"]:::rag
%% Task & Prompt Generation
TASK["Task & Prompt Generation<br/>(task_generator/)"]:::task
%% External LLM & Model Tools
LLM["LLM Provider Agents<br/>(mllm_tools/)"]:::ai
%% Voiceover & Utilities
VOX["Utility Services<br/>(src/utils/)"]:::voice
%% Evaluation Module
EVAL["Quality Evaluation Agent<br/>(eval_suite/)"]:::eval
%% Connections
U -->|"provides data"| GV
GV -->|"reads configuration"| CONF
CONF -->|"configures processing"| CG
CONF -->|"fetches theorem data"| DATA
%% Core Pipeline Flow
GV -->|"orchestrates generation"| CG
CG -->|"sends code/instructions"| VP
VP -->|"plans scenes"| VR
VR -->|"integrates audio"| VOX
VOX -->|"produces final video"| EVAL
%% Cross Module Integrations
TASK -->|"supplies prompt templates"| CG
TASK -->|"guides scene planning"| VP
CG -->|"augments with retrieval"| RAG
VP -->|"queries documentation"| RAG
LLM -->|"supports AI generation"| CG
LLM -->|"supports task generation"| TASK
%% Evaluation Script
ES -->|"evaluates output"| EVAL
%% Styles
classDef input fill:#FFD580,stroke:#333,stroke-width:2px;
classDef config fill:#B3E5FC,stroke:#333,stroke-width:2px;
classDef data fill:#C8E6C9,stroke:#333,stroke-width:2px;
classDef core fill:#FFF59D,stroke:#333,stroke-width:2px;
classDef rag fill:#FFCC80,stroke:#333,stroke-width:2px;
classDef task fill:#D1C4E9,stroke:#333,stroke-width:2px;
classDef ai fill:#B2EBF2,stroke:#333,stroke-width:2px;
classDef voice fill:#FFE0B2,stroke:#333,stroke-width:2px;
classDef eval fill:#E1BEE7,stroke:#333,stroke-width:2px;
๐ค Core Agents & Responsibilities
1. ๐ฌ Video Planning Agent (src/core/video_planner.py)
Role: Strategic planning and scene orchestration
Key Capabilities:
- Scene outline generation and decomposition
- Storyboard creation with visual descriptions
- Technical implementation planning
- Concurrent scene processing with enhanced parallelization
- Context learning from previous examples
- RAG integration for Manim documentation retrieval
Key Methods:
generate_scene_outline()- Creates overall video structuregenerate_scene_implementation_concurrently_enhanced()- Parallel scene planning_initialize_context_examples()- Loads learning contexts
2. โก Code Generation Agent (src/core/code_generator.py)
Role: Manim code synthesis and optimization
Key Capabilities:
- Intelligent Manim code generation from scene descriptions
- Automatic error detection and fixing
- Visual self-reflection for code quality
- RAG-enhanced code generation with documentation context
- Context learning from successful examples
- Banned reasoning prevention
Key Methods:
generate_manim_code()- Primary code generationfix_code_errors()- Intelligent error correctionvisual_self_reflection()- Quality validation
3. ๐๏ธ Video Rendering Agent (src/core/video_renderer.py)
Role: Video compilation and optimization
Key Capabilities:
- Optimized Manim scene rendering
- Intelligent caching system for performance
- Parallel scene processing
- Quality preset management (preview/low/medium/high/production)
- GPU acceleration support
- Video combination and assembly
Key Methods:
render_scene_optimized()- Enhanced scene renderingcombine_videos_optimized()- Final video assembly_get_code_hash()- Intelligent caching
4. ๐ RAG Intelligence Agent (src/rag/rag_integration.py, src/rag/vector_store.py)
Role: Knowledge retrieval and context augmentation
Key Capabilities:
- Manim documentation retrieval
- Plugin detection and relevance scoring
- Vector store management with ChromaDB
- Query generation for technical contexts
- Enhanced document embedding and retrieval
Key Methods:
detect_relevant_plugins()- Smart plugin identificationretrieve_relevant_docs()- Context-aware documentation retrievalgenerate_rag_queries()- Intelligent query formulation
5. ๐ Task & Prompt Generation Service (task_generator/)
Role: Template management and prompt engineering
Key Capabilities:
- Dynamic prompt template generation
- Context-aware prompt customization
- Banned reasoning pattern management
- Multi-modal prompt support
Key Components:
parse_prompt.py- Template processingprompts_raw/- Prompt template repository
6. ๐ค LLM Provider Agents (mllm_tools/)
Role: AI model abstraction and management
Key Capabilities:
- Multi-provider LLM support (OpenAI, Gemini, Vertex AI, OpenRouter)
- Unified interface for different AI models
- Cost tracking and usage monitoring
- Langfuse integration for observability
Key Components:
litellm.py- LiteLLM wrapper for multiple providersopenrouter.py- OpenRouter integrationgemini.py- Google Gemini integrationvertex_ai.py- Google Cloud Vertex AI
7. โ
Quality Evaluation Agent (eval_suite/)
Role: Output validation and quality assurance
Key Capabilities:
- Multi-modal content evaluation (text, image, video)
- Automated quality scoring
- Error pattern detection
- Performance metrics collection
Key Components:
text_utils.py- Text quality evaluationimage_utils.py- Visual content assessmentvideo_utils.py- Video quality metrics
๐ Multi-Agent Workflow
Phase 1: Initialization & Planning
- System Orchestrator (
generate_video.py) receives user input - Configuration Manager loads system settings and model configurations
- Session Manager creates/loads session for continuity
- Video Planning Agent analyzes topic and creates scene breakdown
- RAG Agent detects relevant plugins and retrieves documentation
Phase 2: Implementation Planning
- Video Planning Agent generates detailed implementation plans for each scene
- Task Generator provides appropriate prompt templates
- RAG Agent augments plans with relevant technical documentation
- Scene Analyzer validates plan completeness
Phase 3: Code Generation
- Code Generation Agent transforms scene plans into Manim code
- RAG Agent provides contextual documentation for complex animations
- Error Detection validates code syntax and logic
- Quality Assurance ensures code meets standards
Phase 4: Rendering & Assembly
- Video Rendering Agent executes Manim code to generate scenes
- Caching System optimizes performance through intelligent storage
- Parallel Processing renders multiple scenes concurrently
- Quality Control validates rendered output
Phase 5: Final Assembly
- Video Rendering Agent combines individual scenes
- Audio Integration adds voiceovers and sound effects
- Quality Evaluation Agent performs final validation
- Output Manager delivers final video with metadata
๐๏ธ Design Principles
SOLID Principles Implementation
Single Responsibility Principle
- Each agent has a focused, well-defined purpose
- Clear separation of concerns across components
Open/Closed Principle
- System extensible through composition and interfaces
- New agents can be added without modifying existing code
Liskov Substitution Principle
- Agents implement common interfaces for interchangeability
- Protocol-based design ensures compatibility
Interface Segregation Principle
- Clean, focused interfaces for agent communication
- No forced dependencies on unused functionality
Dependency Inversion Principle
- High-level modules depend on abstractions
- Factory pattern for component creation
Multi-Agent Coordination Patterns
- Pipeline Architecture: Sequential processing with clear handoffs
- Publish-Subscribe: Event-driven communication between agents
- Factory Pattern: Dynamic agent creation and configuration
- Strategy Pattern: Pluggable algorithms for different tasks
- Observer Pattern: Monitoring and logging across agents
โก Performance Optimizations
Concurrency & Parallelization
- Async/Await: Non-blocking agent coordination
- Semaphore Control: Intelligent resource management
- Thread Pools: Parallel I/O operations
- Concurrent Scene Processing: Multiple scenes rendered simultaneously
Intelligent Caching
- Code Hash-based Caching: Avoid redundant renders
- Context Caching: Reuse prompt templates and examples
- Vector Store Caching: Optimized document retrieval
Resource Management
- GPU Acceleration: Hardware-accelerated rendering
- Memory Optimization: Efficient data structures
- Quality Presets: Speed vs. quality tradeoffs
๐ง Configuration Management
Environment Configuration (.env, src/config/config.py)
class VideoGenerationConfig:
planner_model: str # Primary AI model
scene_model: Optional[str] = None # Scene-specific model
helper_model: Optional[str] = None # Helper tasks model
max_scene_concurrency: int = 5 # Parallel scene limit
use_rag: bool = False # RAG integration
enable_caching: bool = True # Performance caching
use_gpu_acceleration: bool = False # Hardware acceleration
Model Provider Configuration
- Support for multiple LLM providers (OpenAI, Gemini, Claude, etc.)
- Unified interface through LiteLLM
- Cost tracking and usage monitoring
- Automatic failover capabilities
๐ Data Flow Architecture
Input Data Sources
- Theorem Datasets: JSON files with mathematical concepts (
data/thb_*/) - Context Learning: Historical examples (
data/context_learning/) - RAG Documentation: Manim docs and plugins (
data/rag/manim_docs/)
Processing Pipeline
User Input โ Topic Analysis โ Scene Planning โ Code Generation โ Rendering โ Quality Check โ Final Output
โ โ โ โ โ โ
Configuration โ RAG Context โ Implementation โ Error Fixing โ Optimization โ Validation
Output Artifacts
- Scene Outlines: Structured video plans
- Implementation Plans: Technical specifications
- Manim Code: Executable animation scripts
- Rendered Videos: Individual scene outputs
- Combined Videos: Final assembled content
- Metadata: Processing logs and metrics
๐ช Advanced Features
Error Recovery & Self-Healing
- Multi-layer Retry Logic: Automatic error recovery at each agent level
- Intelligent Error Analysis: Pattern recognition for common failures
- Self-Reflection: Code quality validation through visual analysis
- Fallback Strategies: Alternative approaches when primary methods fail
Monitoring & Observability
- Langfuse Integration: Comprehensive LLM call tracking
- Performance Metrics: Render times, success rates, resource usage
- Status Dashboard: Real-time pipeline state visualization
- Cost Tracking: Token usage and API cost monitoring
Scalability Features
- Horizontal Scaling: Multiple concurrent topic processing
- Resource Pooling: Shared computational resources
- Load Balancing: Intelligent task distribution
- State Persistence: Resume interrupted processing
๐ Usage Examples
Single Topic Generation
python generate_video.py \
--topic "Pythagorean Theorem" \
--context "Explain the mathematical proof and visual demonstration" \
--model "gemini/gemini-2.5-flash-preview-04-17" \
--use_rag \
--quality medium
Batch Processing
python generate_video.py \
--theorems_path data/thb_easy/math.json \
--sample_size 5 \
--max_scene_concurrency 3 \
--use_context_learning \
--enable_caching
Status Monitoring
python generate_video.py \
--theorems_path data/thb_easy/math.json \
--check_status
๐ System Metrics & KPIs
Performance Indicators
- Scene Generation Speed: Average time per scene
- Rendering Efficiency: Cache hit rates and parallel utilization
- Quality Scores: Automated evaluation metrics
- Success Rates: Completion percentage across pipeline stages
Resource Utilization
- LLM Token Usage: Cost optimization and efficiency
- Computational Resources: CPU/GPU utilization
- Storage Efficiency: Cache effectiveness and data management
- Memory Footprint: System resource consumption
๐ฎ Future Enhancements
Planned Agent Improvements
- Advanced Visual Agent: Enhanced image understanding and generation
- Audio Synthesis Agent: Dynamic voiceover generation
- Interactive Agent: Real-time user feedback integration
- Curriculum Agent: Adaptive learning path generation
Technical Roadmap
- Distributed Processing: Multi-node agent deployment
- Real-time Streaming: Live video generation capabilities
- Mobile Integration: Responsive design for mobile platforms
- API Gateway: RESTful service architecture
๐ Related Documentation
- API Reference - Detailed method documentation
- Configuration Guide - Setup and customization
- Development Guide - Contributing and extending
- Troubleshooting - Common issues and solutions
Last Updated: August 25, 2025
Version: Multi-Agent Enhanced Pipeline v2.0
Maintainer: T2M Development Team