brickfrog's picture
Upload folder using huggingface_hub
56fd459 verified
# AnkiGen Agent System
A sophisticated multi-agent system for generating high-quality flashcards using specialized AI agents.
## Overview
The AnkiGen Agent System replaces the traditional single-LLM approach with a pipeline of specialized agents:
- **Generator Agents**: Create cards with domain expertise
- **Judge Agents**: Assess quality using multiple criteria
- **Enhancement Agents**: Improve and enrich card content
- **Coordinators**: Orchestrate workflows and handoffs
## Quick Start
### 1. Installation
```bash
pip install openai-agents pyyaml
```
### 2. Environment Configuration
Create a `.env` file or set environment variables:
```bash
# Basic agent mode
export ANKIGEN_AGENT_MODE=hybrid
# Enable specific agents
export ANKIGEN_ENABLE_SUBJECT_EXPERT=true
export ANKIGEN_ENABLE_CONTENT_JUDGE=true
export ANKIGEN_ENABLE_CLARITY_JUDGE=true
# Performance settings
export ANKIGEN_AGENT_TIMEOUT=30.0
export ANKIGEN_MIN_JUDGE_CONSENSUS=0.6
```
### 3. Usage
```python
from ankigen_core.agents.integration import AgentOrchestrator
from ankigen_core.llm_interface import OpenAIClientManager
# Initialize
client_manager = OpenAIClientManager()
orchestrator = AgentOrchestrator(client_manager)
await orchestrator.initialize("your-openai-api-key")
# Generate cards with agents
cards, metadata = await orchestrator.generate_cards_with_agents(
topic="Python Functions",
subject="programming",
num_cards=5,
difficulty="intermediate"
)
```
## Agent Types
### Generation Agents
#### SubjectExpertAgent
- **Purpose**: Domain-specific card generation
- **Specializes**: Technical accuracy, terminology, real-world applications
- **Configuration**: `ANKIGEN_ENABLE_SUBJECT_EXPERT=true`
#### PedagogicalAgent
- **Purpose**: Educational effectiveness review
- **Specializes**: Bloom's taxonomy, cognitive load, learning objectives
- **Configuration**: `ANKIGEN_ENABLE_PEDAGOGICAL_AGENT=true`
#### ContentStructuringAgent
- **Purpose**: Consistent formatting and organization
- **Specializes**: Metadata enrichment, standardization
- **Configuration**: `ANKIGEN_ENABLE_CONTENT_STRUCTURING=true`
#### GenerationCoordinator
- **Purpose**: Orchestrates multi-agent generation workflows
- **Configuration**: `ANKIGEN_ENABLE_GENERATION_COORDINATOR=true`
### Judge Agents
#### ContentAccuracyJudge
- **Evaluates**: Factual correctness, terminology, misconceptions
- **Model**: GPT-4o (high accuracy needed)
- **Configuration**: `ANKIGEN_ENABLE_CONTENT_JUDGE=true`
#### PedagogicalJudge
- **Evaluates**: Educational effectiveness, cognitive levels
- **Model**: GPT-4o
- **Configuration**: `ANKIGEN_ENABLE_PEDAGOGICAL_JUDGE=true`
#### ClarityJudge
- **Evaluates**: Communication clarity, readability
- **Model**: GPT-4o-mini (cost-effective)
- **Configuration**: `ANKIGEN_ENABLE_CLARITY_JUDGE=true`
#### TechnicalJudge
- **Evaluates**: Code syntax, best practices (technical content only)
- **Model**: GPT-4o
- **Configuration**: `ANKIGEN_ENABLE_TECHNICAL_JUDGE=true`
#### CompletenessJudge
- **Evaluates**: Required fields, metadata, quality standards
- **Model**: GPT-4o-mini
- **Configuration**: `ANKIGEN_ENABLE_COMPLETENESS_JUDGE=true`
### Enhancement Agents
#### RevisionAgent
- **Purpose**: Improves rejected cards based on judge feedback
- **Configuration**: `ANKIGEN_ENABLE_REVISION_AGENT=true`
#### EnhancementAgent
- **Purpose**: Adds missing content and enriches metadata
- **Configuration**: `ANKIGEN_ENABLE_ENHANCEMENT_AGENT=true`
## Operating Modes
### Legacy Mode
```bash
export ANKIGEN_AGENT_MODE=legacy
```
Uses the original single-LLM approach.
### Agent-Only Mode
```bash
export ANKIGEN_AGENT_MODE=agent_only
```
Forces use of agent system for all generation.
### Hybrid Mode
```bash
export ANKIGEN_AGENT_MODE=hybrid
```
Uses agents when enabled via feature flags, falls back to legacy otherwise.
### A/B Testing Mode
```bash
export ANKIGEN_AGENT_MODE=a_b_test
export ANKIGEN_AB_TEST_RATIO=0.5
```
Randomly assigns users to agent vs legacy generation for comparison.
## Configuration
### Agent Configuration Files
Agents can be configured via YAML files in `config/agents/`:
```yaml
# config/agents/defaults/generators.yaml
agents:
subject_expert:
instructions: "You are a world-class expert in {subject}..."
model: "gpt-4o"
temperature: 0.7
timeout: 45.0
custom_prompts:
math: "Focus on problem-solving strategies"
science: "Emphasize experimental design"
```
### Environment Variables
#### Agent Control
- `ANKIGEN_AGENT_MODE`: Operating mode (legacy/agent_only/hybrid/a_b_test)
- `ANKIGEN_ENABLE_*`: Enable specific agents (true/false)
#### Performance
- `ANKIGEN_AGENT_TIMEOUT`: Agent execution timeout (seconds)
- `ANKIGEN_MAX_AGENT_RETRIES`: Maximum retry attempts
- `ANKIGEN_ENABLE_AGENT_CACHING`: Enable response caching
#### Quality Control
- `ANKIGEN_MIN_JUDGE_CONSENSUS`: Minimum agreement between judges (0.0-1.0)
- `ANKIGEN_MAX_REVISION_ITERATIONS`: Maximum revision attempts
## Monitoring & Metrics
### Built-in Metrics
The system automatically tracks:
- Agent execution times and success rates
- Quality approval/rejection rates
- Token usage and costs
- Judge consensus scores
### Performance Dashboard
```python
orchestrator = AgentOrchestrator(client_manager)
metrics = orchestrator.get_performance_metrics()
print(f"24h Performance: {metrics['agent_performance']}")
print(f"Quality Metrics: {metrics['quality_metrics']}")
```
### Tracing
OpenAI Agents SDK provides built-in tracing UI for debugging workflows.
## Quality Pipeline
### Phase 1: Generation
1. Route to appropriate subject expert
2. Generate initial cards
3. Optional pedagogical review
4. Optional content structuring
### Phase 2: Quality Assessment
1. Route cards to relevant judges
2. Parallel evaluation by multiple specialists
3. Calculate consensus scores
4. Approve/reject based on thresholds
### Phase 3: Improvement
1. Revise rejected cards using judge feedback
2. Re-evaluate revised cards
3. Enhance approved cards with additional content
## Cost Optimization
### Model Selection
- **Generation**: GPT-4o for accuracy
- **Simple Judges**: GPT-4o-mini for cost efficiency
- **Critical Judges**: GPT-4o for quality
### Caching Strategy
- Response caching at agent level
- Shared cache across similar requests
- Configurable cache TTL
### Parallel Processing
- Judge agents run in parallel
- Batch processing for multiple cards
- Async execution throughout
## Migration Strategy
### Gradual Rollout
1. Start with single judge agent
2. Enable A/B testing
3. Gradually enable more agents
4. Monitor quality improvements
### Rollback Plan
- Keep legacy system as fallback
- Feature flags for quick disable
- Performance comparison dashboards
### Success Metrics
- 20%+ improvement in card quality scores
- Reduced manual editing needs
- Better user satisfaction ratings
- Maintained or improved generation speed
## Troubleshooting
### Common Issues
#### Agents Not Initializing
- Check OpenAI API key validity
- Verify agent mode configuration
- Check feature flag settings
#### Poor Quality Results
- Adjust judge consensus thresholds
- Enable more specialized judges
- Review agent configuration prompts
#### Performance Issues
- Enable caching
- Use parallel processing
- Optimize model selection
### Debug Mode
```bash
export ANKIGEN_ENABLE_AGENT_TRACING=true
```
Enables detailed logging and tracing UI for workflow debugging.
## Examples
### Basic Usage
```python
# Simple generation with agents
cards, metadata = await orchestrator.generate_cards_with_agents(
topic="Machine Learning",
subject="data_science",
num_cards=10
)
```
### Advanced Configuration
```python
# Custom enhancement targets
cards = await enhancement_agent.enhance_card_batch(
cards=cards,
enhancement_targets=["prerequisites", "learning_outcomes", "examples"]
)
```
### Quality Pipeline
```python
# Manual quality assessment
judge_results = await judge_coordinator.coordinate_judgment(
cards=cards,
enable_parallel=True,
min_consensus=0.8
)
```
## Contributing
### Adding New Agents
1. Inherit from `BaseAgentWrapper`
2. Add configuration in YAML files
3. Update feature flags
4. Add to coordinator workflows
### Testing
```bash
python -m pytest tests/unit/test_agents/
python -m pytest tests/integration/test_agent_workflows.py
```
## Support
For issues and questions:
- Check the troubleshooting guide
- Review agent tracing logs
- Monitor performance metrics
- Enable debug mode for detailed logging