Spaces:

brickfrog
/

ankigen

Running

App Files Files Community

ankigen / ankigen_core /agents /README.md

brickfrog

Upload folder using huggingface_hub

56fd459 verified 3 months ago

preview code

raw

history blame contribute delete

8.53 kB

	# AnkiGen Agent System

	A sophisticated multi-agent system for generating high-quality flashcards using specialized AI agents.

	## Overview

	The AnkiGen Agent System replaces the traditional single-LLM approach with a pipeline of specialized agents:

	- Generator Agents: Create cards with domain expertise
	- Judge Agents: Assess quality using multiple criteria
	- Enhancement Agents: Improve and enrich card content
	- Coordinators: Orchestrate workflows and handoffs

	## Quick Start

	### 1. Installation

	```bash
	pip install openai-agents pyyaml
	```

	### 2. Environment Configuration

	Create a `.env` file or set environment variables:

	```bash
	# Basic agent mode
	export ANKIGEN_AGENT_MODE=hybrid

	# Enable specific agents
	export ANKIGEN_ENABLE_SUBJECT_EXPERT=true
	export ANKIGEN_ENABLE_CONTENT_JUDGE=true
	export ANKIGEN_ENABLE_CLARITY_JUDGE=true

	# Performance settings
	export ANKIGEN_AGENT_TIMEOUT=30.0
	export ANKIGEN_MIN_JUDGE_CONSENSUS=0.6
	```

	### 3. Usage

	```python
	from ankigen_core.agents.integration import AgentOrchestrator
	from ankigen_core.llm_interface import OpenAIClientManager

	# Initialize
	client_manager = OpenAIClientManager()
	orchestrator = AgentOrchestrator(client_manager)
	await orchestrator.initialize("your-openai-api-key")

	# Generate cards with agents
	cards, metadata = await orchestrator.generate_cards_with_agents(
	topic="Python Functions",
	subject="programming",
	num_cards=5,
	difficulty="intermediate"
	)
	```

	## Agent Types

	### Generation Agents

	#### SubjectExpertAgent
	- Purpose: Domain-specific card generation
	- Specializes: Technical accuracy, terminology, real-world applications
	- Configuration: `ANKIGEN_ENABLE_SUBJECT_EXPERT=true`

	#### PedagogicalAgent
	- Purpose: Educational effectiveness review
	- Specializes: Bloom's taxonomy, cognitive load, learning objectives
	- Configuration: `ANKIGEN_ENABLE_PEDAGOGICAL_AGENT=true`

	#### ContentStructuringAgent
	- Purpose: Consistent formatting and organization
	- Specializes: Metadata enrichment, standardization
	- Configuration: `ANKIGEN_ENABLE_CONTENT_STRUCTURING=true`

	#### GenerationCoordinator
	- Purpose: Orchestrates multi-agent generation workflows
	- Configuration: `ANKIGEN_ENABLE_GENERATION_COORDINATOR=true`

	### Judge Agents

	#### ContentAccuracyJudge
	- Evaluates: Factual correctness, terminology, misconceptions
	- Model: GPT-4o (high accuracy needed)
	- Configuration: `ANKIGEN_ENABLE_CONTENT_JUDGE=true`

	#### PedagogicalJudge
	- Evaluates: Educational effectiveness, cognitive levels
	- Model: GPT-4o
	- Configuration: `ANKIGEN_ENABLE_PEDAGOGICAL_JUDGE=true`

	#### ClarityJudge
	- Evaluates: Communication clarity, readability
	- Model: GPT-4o-mini (cost-effective)
	- Configuration: `ANKIGEN_ENABLE_CLARITY_JUDGE=true`

	#### TechnicalJudge
	- Evaluates: Code syntax, best practices (technical content only)
	- Model: GPT-4o
	- Configuration: `ANKIGEN_ENABLE_TECHNICAL_JUDGE=true`

	#### CompletenessJudge
	- Evaluates: Required fields, metadata, quality standards
	- Model: GPT-4o-mini
	- Configuration: `ANKIGEN_ENABLE_COMPLETENESS_JUDGE=true`

	### Enhancement Agents

	#### RevisionAgent
	- Purpose: Improves rejected cards based on judge feedback
	- Configuration: `ANKIGEN_ENABLE_REVISION_AGENT=true`

	#### EnhancementAgent
	- Purpose: Adds missing content and enriches metadata
	- Configuration: `ANKIGEN_ENABLE_ENHANCEMENT_AGENT=true`

	## Operating Modes

	### Legacy Mode
	```bash
	export ANKIGEN_AGENT_MODE=legacy
	```
	Uses the original single-LLM approach.

	### Agent-Only Mode
	```bash
	export ANKIGEN_AGENT_MODE=agent_only
	```
	Forces use of agent system for all generation.

	### Hybrid Mode
	```bash
	export ANKIGEN_AGENT_MODE=hybrid
	```
	Uses agents when enabled via feature flags, falls back to legacy otherwise.

	### A/B Testing Mode
	```bash
	export ANKIGEN_AGENT_MODE=a_b_test
	export ANKIGEN_AB_TEST_RATIO=0.5
	```
	Randomly assigns users to agent vs legacy generation for comparison.

	## Configuration

	### Agent Configuration Files

	Agents can be configured via YAML files in `config/agents/`:

	```yaml
	# config/agents/defaults/generators.yaml
	agents:
	subject_expert:
	instructions: "You are a world-class expert in {subject}..."
	model: "gpt-4o"
	temperature: 0.7
	timeout: 45.0
	custom_prompts:
	math: "Focus on problem-solving strategies"
	science: "Emphasize experimental design"
	```

	### Environment Variables

	#### Agent Control
	- `ANKIGEN_AGENT_MODE`: Operating mode (legacy/agent_only/hybrid/a_b_test)
	- `ANKIGEN_ENABLE_*`: Enable specific agents (true/false)

	#### Performance
	- `ANKIGEN_AGENT_TIMEOUT`: Agent execution timeout (seconds)
	- `ANKIGEN_MAX_AGENT_RETRIES`: Maximum retry attempts
	- `ANKIGEN_ENABLE_AGENT_CACHING`: Enable response caching

	#### Quality Control
	- `ANKIGEN_MIN_JUDGE_CONSENSUS`: Minimum agreement between judges (0.0-1.0)
	- `ANKIGEN_MAX_REVISION_ITERATIONS`: Maximum revision attempts

	## Monitoring & Metrics

	### Built-in Metrics
	The system automatically tracks:
	- Agent execution times and success rates
	- Quality approval/rejection rates
	- Token usage and costs
	- Judge consensus scores

	### Performance Dashboard
	```python
	orchestrator = AgentOrchestrator(client_manager)
	metrics = orchestrator.get_performance_metrics()

	print(f"24h Performance: {metrics['agent_performance']}")
	print(f"Quality Metrics: {metrics['quality_metrics']}")
	```

	### Tracing
	OpenAI Agents SDK provides built-in tracing UI for debugging workflows.

	## Quality Pipeline

	### Phase 1: Generation
	1. Route to appropriate subject expert
	2. Generate initial cards
	3. Optional pedagogical review
	4. Optional content structuring

	### Phase 2: Quality Assessment
	1. Route cards to relevant judges
	2. Parallel evaluation by multiple specialists
	3. Calculate consensus scores
	4. Approve/reject based on thresholds

	### Phase 3: Improvement
	1. Revise rejected cards using judge feedback
	2. Re-evaluate revised cards
	3. Enhance approved cards with additional content

	## Cost Optimization

	### Model Selection
	- Generation: GPT-4o for accuracy
	- Simple Judges: GPT-4o-mini for cost efficiency
	- Critical Judges: GPT-4o for quality

	### Caching Strategy
	- Response caching at agent level
	- Shared cache across similar requests
	- Configurable cache TTL

	### Parallel Processing
	- Judge agents run in parallel
	- Batch processing for multiple cards
	- Async execution throughout

	## Migration Strategy

	### Gradual Rollout
	1. Start with single judge agent
	2. Enable A/B testing
	3. Gradually enable more agents
	4. Monitor quality improvements

	### Rollback Plan
	- Keep legacy system as fallback
	- Feature flags for quick disable
	- Performance comparison dashboards

	### Success Metrics
	- 20%+ improvement in card quality scores
	- Reduced manual editing needs
	- Better user satisfaction ratings
	- Maintained or improved generation speed

	## Troubleshooting

	### Common Issues

	#### Agents Not Initializing
	- Check OpenAI API key validity
	- Verify agent mode configuration
	- Check feature flag settings

	#### Poor Quality Results
	- Adjust judge consensus thresholds
	- Enable more specialized judges
	- Review agent configuration prompts

	#### Performance Issues
	- Enable caching
	- Use parallel processing
	- Optimize model selection

	### Debug Mode
	```bash
	export ANKIGEN_ENABLE_AGENT_TRACING=true
	```

	Enables detailed logging and tracing UI for workflow debugging.

	## Examples

	### Basic Usage
	```python
	# Simple generation with agents
	cards, metadata = await orchestrator.generate_cards_with_agents(
	topic="Machine Learning",
	subject="data_science",
	num_cards=10
	)
	```

	### Advanced Configuration
	```python
	# Custom enhancement targets
	cards = await enhancement_agent.enhance_card_batch(
	cards=cards,
	enhancement_targets=["prerequisites", "learning_outcomes", "examples"]
	)
	```

	### Quality Pipeline
	```python
	# Manual quality assessment
	judge_results = await judge_coordinator.coordinate_judgment(
	cards=cards,
	enable_parallel=True,
	min_consensus=0.8
	)
	```

	## Contributing

	### Adding New Agents
	1. Inherit from `BaseAgentWrapper`
	2. Add configuration in YAML files
	3. Update feature flags
	4. Add to coordinator workflows

	### Testing
	```bash
	python -m pytest tests/unit/test_agents/
	python -m pytest tests/integration/test_agent_workflows.py
	```

	## Support

	For issues and questions:
	- Check the troubleshooting guide
	- Review agent tracing logs
	- Monitor performance metrics
	- Enable debug mode for detailed logging