Spaces:

DroolingPanda
/

teachingAssistant

Sleeping

App Files Files Community

teachingAssistant / DEVELOPER_GUIDE.md

Michael Hu

add more logs

fdc056d 4 months ago

preview code

raw

history blame

22.5 kB

	# Developer Guide

	This guide provides comprehensive instructions for extending the Audio Translation System with new providers and contributing to the codebase.

	## Table of Contents

	- [Architecture Overview](#architecture-overview)
	- [Adding New TTS Providers](#adding-new-tts-providers)
	- [Adding New STT Providers](#adding-new-stt-providers)
	- [Adding New Translation Providers](#adding-new-translation-providers)
	- [Testing Guidelines](#testing-guidelines)
	- [Code Style and Standards](#code-style-and-standards)
	- [Debugging and Troubleshooting](#debugging-and-troubleshooting)
	- [Performance Considerations](#performance-considerations)

	## Architecture Overview

	The system follows Domain-Driven Design (DDD) principles with clear separation of concerns:

	```
	src/
	├── domain/ # Core business logic
	│ ├── interfaces/ # Service contracts (ports)
	│ ├── models/ # Domain entities and value objects
	│ ├── services/ # Domain services
	│ └── exceptions.py # Domain-specific exceptions
	├── application/ # Use case orchestration
	│ ├── services/ # Application services
	│ ├── dtos/ # Data transfer objects
	│ └── error_handling/ # Application error handling
	├── infrastructure/ # External service implementations
	│ ├── tts/ # TTS provider implementations
	│ ├── stt/ # STT provider implementations
	│ ├── translation/ # Translation service implementations
	│ ├── base/ # Provider base classes
	│ └── config/ # Configuration and DI container
	└── presentation/ # UI layer (app.py)
	```

	### Key Design Patterns

	1. Provider Pattern: Pluggable implementations for different services
	2. Factory Pattern: Provider creation with fallback logic
	3. Dependency Injection: Loose coupling between components
	4. Repository Pattern: Data access abstraction
	5. Strategy Pattern: Runtime algorithm selection

	## Adding New TTS Providers

	### Step 1: Implement the Provider Class

	Create a new provider class that inherits from `TTSProviderBase`:

	```python
	# src/infrastructure/tts/my_tts_provider.py

	import logging
	from typing import Iterator, List
	from ..base.tts_provider_base import TTSProviderBase
	from ...domain.models.speech_synthesis_request import SpeechSynthesisRequest
	from ...domain.exceptions import SpeechSynthesisException

	logger = logging.getLogger(__name__)


	class MyTTSProvider(TTSProviderBase):
	"""Custom TTS provider implementation."""

	def __init__(self, api_key: str = None, **kwargs):
	"""Initialize the TTS provider.

	Args:
	api_key: Optional API key for cloud-based services
	**kwargs: Additional provider-specific configuration
	"""
	super().__init__(
	provider_name="my_tts",
	supported_languages=["en", "zh", "es", "fr"]
	)
	self.api_key = api_key
	self._initialize_provider()

	def _initialize_provider(self):
	"""Initialize provider-specific resources."""
	try:
	# Initialize your TTS engine/model here
	# Example: self.engine = MyTTSEngine(api_key=self.api_key)
	pass
	except Exception as e:
	logger.error(f"Failed to initialize {self.provider_name}: {e}")
	raise SpeechSynthesisException(f"Provider initialization failed: {e}")

	def is_available(self) -> bool:
	"""Check if the provider is available and ready to use."""
	try:
	# Check if dependencies are installed
	# Check if models are loaded
	# Check if API is accessible (for cloud services)
	return True # Replace with actual availability check
	except Exception:
	return False

	def get_available_voices(self) -> List[str]:
	"""Get list of available voices for this provider."""
	# Return actual voice IDs supported by your provider
	return ["voice1", "voice2", "voice3"]

	def _generate_audio(self, request: SpeechSynthesisRequest) -> tuple[bytes, int]:
	"""Generate audio data from synthesis request.

	Args:
	request: The speech synthesis request

	Returns:
	tuple: (audio_data_bytes, sample_rate)
	"""
	try:
	text = request.text_content.text
	voice_id = request.voice_settings.voice_id
	speed = request.voice_settings.speed

	# Implement your TTS synthesis logic here
	# Example:
	# audio_data = self.engine.synthesize(
	# text=text,
	# voice=voice_id,
	# speed=speed
	# )

	# Return audio data and sample rate
	audio_data = b"dummy_audio_data" # Replace with actual synthesis
	sample_rate = 22050 # Replace with actual sample rate

	return audio_data, sample_rate

	except Exception as e:
	self._handle_provider_error(e, "audio generation")

	def _generate_audio_stream(self, request: SpeechSynthesisRequest) -> Iterator[tuple[bytes, int, bool]]:
	"""Generate audio data stream from synthesis request.

	Args:
	request: The speech synthesis request

	Yields:
	tuple: (audio_data_bytes, sample_rate, is_final)
	"""
	try:
	# Implement streaming synthesis if supported
	# For non-streaming providers, you can yield the complete audio as a single chunk

	audio_data, sample_rate = self._generate_audio(request)
	yield audio_data, sample_rate, True

	except Exception as e:
	self._handle_provider_error(e, "streaming audio generation")
	```

	### Step 2: Register the Provider

	Add your provider to the factory registration:

	```python
	# src/infrastructure/tts/provider_factory.py

	def _register_default_providers(self):
	"""Register all available TTS providers."""
	# ... existing providers ...

	# Try to register your custom provider
	try:
	from .my_tts_provider import MyTTSProvider
	self._providers['my_tts'] = MyTTSProvider
	logger.info("Registered MyTTS provider")
	except ImportError as e:
	logger.info(f"MyTTS provider not available: {e}")
	```

	### Step 3: Add Configuration Support

	Update the configuration to include your provider:

	```python
	# src/infrastructure/config/app_config.py

	class AppConfig:
	# ... existing configuration ...

	# TTS Provider Configuration
	TTS_PROVIDERS = os.getenv('TTS_PROVIDERS', 'kokoro,dia,cosyvoice2,my_tts,dummy').split(',')

	# Provider-specific settings
	MY_TTS_API_KEY = os.getenv('MY_TTS_API_KEY')
	MY_TTS_MODEL = os.getenv('MY_TTS_MODEL', 'default')
	```

	### Step 4: Add Tests

	Create comprehensive tests for your provider:

	```python
	# tests/unit/infrastructure/tts/test_my_tts_provider.py

	import pytest
	from unittest.mock import Mock, patch
	from src.infrastructure.tts.my_tts_provider import MyTTSProvider
	from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
	from src.domain.models.text_content import TextContent
	from src.domain.models.voice_settings import VoiceSettings
	from src.domain.exceptions import SpeechSynthesisException


	class TestMyTTSProvider:
	"""Test suite for MyTTS provider."""

	@pytest.fixture
	def provider(self):
	"""Create a test provider instance."""
	return MyTTSProvider(api_key="test_key")

	@pytest.fixture
	def synthesis_request(self):
	"""Create a test synthesis request."""
	text_content = TextContent(text="Hello world", language="en")
	voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
	return SpeechSynthesisRequest(
	text_content=text_content,
	voice_settings=voice_settings
	)

	def test_provider_initialization(self, provider):
	"""Test provider initializes correctly."""
	assert provider.provider_name == "my_tts"
	assert "en" in provider.supported_languages
	assert provider.is_available()

	def test_get_available_voices(self, provider):
	"""Test voice listing."""
	voices = provider.get_available_voices()
	assert isinstance(voices, list)
	assert len(voices) > 0
	assert "voice1" in voices

	def test_synthesize_success(self, provider, synthesis_request):
	"""Test successful synthesis."""
	with patch.object(provider, '_generate_audio') as mock_generate:
	mock_generate.return_value = (b"audio_data", 22050)

	result = provider.synthesize(synthesis_request)

	assert result.data == b"audio_data"
	assert result.format == "wav"
	assert result.sample_rate == 22050
	mock_generate.assert_called_once_with(synthesis_request)

	def test_synthesize_failure(self, provider, synthesis_request):
	"""Test synthesis failure handling."""
	with patch.object(provider, '_generate_audio') as mock_generate:
	mock_generate.side_effect = Exception("Synthesis failed")

	with pytest.raises(SpeechSynthesisException):
	provider.synthesize(synthesis_request)

	def test_synthesize_stream(self, provider, synthesis_request):
	"""Test streaming synthesis."""
	chunks = list(provider.synthesize_stream(synthesis_request))

	assert len(chunks) > 0
	assert chunks[-1].is_final # Last chunk should be marked as final

	# Verify chunk structure
	for chunk in chunks:
	assert hasattr(chunk, 'data')
	assert hasattr(chunk, 'sample_rate')
	assert hasattr(chunk, 'is_final')
	```

	### Step 5: Add Integration Tests

	```python
	# tests/integration/test_my_tts_integration.py

	import pytest
	from src.infrastructure.config.container_setup import initialize_global_container
	from src.infrastructure.tts.provider_factory import TTSProviderFactory
	from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
	from src.domain.models.text_content import TextContent
	from src.domain.models.voice_settings import VoiceSettings


	@pytest.mark.integration
	class TestMyTTSIntegration:
	"""Integration tests for MyTTS provider."""

	def test_provider_factory_integration(self):
	"""Test provider works with factory."""
	factory = TTSProviderFactory()

	if 'my_tts' in factory.get_available_providers():
	provider = factory.create_provider('my_tts')
	assert provider.is_available()
	assert len(provider.get_available_voices()) > 0

	def test_end_to_end_synthesis(self):
	"""Test complete synthesis workflow."""
	container = initialize_global_container()
	factory = container.resolve(TTSProviderFactory)

	if 'my_tts' in factory.get_available_providers():
	provider = factory.create_provider('my_tts')

	# Create synthesis request
	text_content = TextContent(text="Integration test", language="en")
	voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
	request = SpeechSynthesisRequest(
	text_content=text_content,
	voice_settings=voice_settings
	)

	# Synthesize audio
	result = provider.synthesize(request)

	assert result.data is not None
	assert result.duration > 0
	assert result.sample_rate > 0
	```

	## Adding New STT Providers

	### Step 1: Implement the Provider Class

	```python
	# src/infrastructure/stt/my_stt_provider.py

	import logging
	from typing import List
	from ..base.stt_provider_base import STTProviderBase
	from ...domain.models.audio_content import AudioContent
	from ...domain.models.text_content import TextContent
	from ...domain.exceptions import SpeechRecognitionException

	logger = logging.getLogger(__name__)


	class MySTTProvider(STTProviderBase):
	"""Custom STT provider implementation."""

	def __init__(self, model_path: str = None, **kwargs):
	"""Initialize the STT provider.

	Args:
	model_path: Path to the STT model
	**kwargs: Additional provider-specific configuration
	"""
	super().__init__(
	provider_name="my_stt",
	supported_languages=["en", "zh", "es", "fr"],
	supported_models=["my_stt_small", "my_stt_large"]
	)
	self.model_path = model_path
	self._initialize_provider()

	def _initialize_provider(self):
	"""Initialize provider-specific resources."""
	try:
	# Initialize your STT engine/model here
	# Example: self.model = MySTTModel.load(self.model_path)
	pass
	except Exception as e:
	logger.error(f"Failed to initialize {self.provider_name}: {e}")
	raise SpeechRecognitionException(f"Provider initialization failed: {e}")

	def is_available(self) -> bool:
	"""Check if the provider is available."""
	try:
	# Check dependencies, model availability, etc.
	return True # Replace with actual check
	except Exception:
	return False

	def get_supported_models(self) -> List[str]:
	"""Get list of supported models."""
	return self.supported_models

	def _transcribe_audio(self, audio: AudioContent, model: str) -> tuple[str, float, dict]:
	"""Transcribe audio using the specified model.

	Args:
	audio: Audio content to transcribe
	model: Model identifier to use

	Returns:
	tuple: (transcribed_text, confidence_score, metadata)
	"""
	try:
	# Implement your STT logic here
	# Example:
	# result = self.model.transcribe(
	# audio_data=audio.data,
	# sample_rate=audio.sample_rate,
	# model=model
	# )

	# Return transcription results
	text = "Transcribed text" # Replace with actual transcription
	confidence = 0.95 # Replace with actual confidence
	metadata = {
	"model_used": model,
	"processing_time": 1.5,
	"language_detected": "en"
	}

	return text, confidence, metadata

	except Exception as e:
	self._handle_provider_error(e, "transcription")
	```

	### Step 2: Register and Test

	Follow similar steps as TTS providers for registration, configuration, and testing.

	## Adding New Translation Providers

	### Step 1: Implement the Provider Class

	```python
	# src/infrastructure/translation/my_translation_provider.py

	import logging
	from typing import List, Dict
	from ..base.translation_provider_base import TranslationProviderBase
	from ...domain.models.translation_request import TranslationRequest
	from ...domain.models.text_content import TextContent
	from ...domain.exceptions import TranslationFailedException

	logger = logging.getLogger(__name__)


	class MyTranslationProvider(TranslationProviderBase):
	"""Custom translation provider implementation."""

	def __init__(self, api_key: str = None, **kwargs):
	"""Initialize the translation provider."""
	super().__init__(
	provider_name="my_translation",
	supported_languages=["en", "zh", "es", "fr", "de", "ja"]
	)
	self.api_key = api_key
	self._initialize_provider()

	def _initialize_provider(self):
	"""Initialize provider-specific resources."""
	try:
	# Initialize your translation engine/model here
	pass
	except Exception as e:
	logger.error(f"Failed to initialize {self.provider_name}: {e}")
	raise TranslationFailedException(f"Provider initialization failed: {e}")

	def is_available(self) -> bool:
	"""Check if the provider is available."""
	try:
	# Check dependencies, API connectivity, etc.
	return True # Replace with actual check
	except Exception:
	return False

	def get_supported_language_pairs(self) -> List[tuple[str, str]]:
	"""Get supported language pairs."""
	# Return list of (source_lang, target_lang) tuples
	pairs = []
	for source in self.supported_languages:
	for target in self.supported_languages:
	if source != target:
	pairs.append((source, target))
	return pairs

	def _translate_text(self, request: TranslationRequest) -> tuple[str, float, dict]:
	"""Translate text using the provider.

	Args:
	request: Translation request

	Returns:
	tuple: (translated_text, confidence_score, metadata)
	"""
	try:
	source_text = request.text_content.text
	source_lang = request.source_language or request.text_content.language
	target_lang = request.target_language

	# Implement your translation logic here
	# Example:
	# result = self.translator.translate(
	# text=source_text,
	# source_lang=source_lang,
	# target_lang=target_lang
	# )

	# Return translation results
	translated_text = f"Translated: {source_text}" # Replace with actual translation
	confidence = 0.92 # Replace with actual confidence
	metadata = {
	"source_language_detected": source_lang,
	"target_language": target_lang,
	"processing_time": 0.5,
	"model_used": "my_translation_model"
	}

	return translated_text, confidence, metadata

	except Exception as e:
	self._handle_provider_error(e, "translation")
	```

	## Testing Guidelines

	### Unit Testing

	- Test each provider in isolation using mocks
	- Cover success and failure scenarios
	- Test edge cases (empty input, invalid parameters)
	- Verify error handling and exception propagation

	### Integration Testing

	- Test provider integration with factories
	- Test complete pipeline workflows
	- Test fallback mechanisms
	- Test with real external services (when available)

	### Performance Testing

	- Measure processing times for different input sizes
	- Test memory usage and resource cleanup
	- Test concurrent processing capabilities
	- Benchmark against existing providers

	### Test Structure

	```
	tests/
	├── unit/
	│ ├── domain/
	│ ├── application/
	│ └── infrastructure/
	│ ├── tts/
	│ ├── stt/
	│ └── translation/
	├── integration/
	│ ├── test_complete_pipeline.py
	│ ├── test_provider_fallback.py
	│ └── test_error_recovery.py
	└── performance/
	├── test_processing_speed.py
	├── test_memory_usage.py
	└── test_concurrent_processing.py
	```

	## Code Style and Standards

	### Python Style Guide

	- Follow PEP 8 for code formatting
	- Use type hints for all public methods
	- Write comprehensive docstrings (Google style)
	- Use meaningful variable and function names
	- Keep functions focused and small (< 50 lines)

	### Documentation Standards

	- Document all public interfaces
	- Include usage examples in docstrings
	- Explain complex algorithms and business logic
	- Keep documentation up-to-date with code changes

	### Error Handling

	- Use domain-specific exceptions
	- Provide detailed error messages
	- Log errors with appropriate levels
	- Implement graceful degradation where possible

	### Logging

	```python
	import logging

	logger = logging.getLogger(__name__)

	# Use appropriate log levels
	logger.info("Detailed debugging information")
	logger.info("General information about program execution")
	logger.warning("Something unexpected happened")
	logger.error("A serious error occurred")
	logger.critical("A very serious error occurred")
	```

	## Debugging and Troubleshooting

	### Common Issues

	1. Provider Not Available
	- Check dependencies are installed
	- Verify configuration settings
	- Check logs for initialization errors

	2. Poor Quality Output
	- Verify input audio quality
	- Check model parameters
	- Review provider-specific settings

	3. Performance Issues
	- Profile code execution
	- Check memory usage
	- Optimize audio processing pipeline

	### Debugging Tools

	- Use Python debugger (pdb) for step-through debugging
	- Enable detailed logging for troubleshooting
	- Use profiling tools (cProfile, memory_profiler)
	- Monitor system resources during processing

	### Logging Configuration

	```python
	# Enable debug logging for development
	import logging
	logging.basicConfig(
	level=logging.DEBUG,
	format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
	handlers=[
	logging.FileHandler("debug.log"),
	logging.StreamHandler()
	]
	)
	```

	## Performance Considerations

	### Optimization Strategies

	1. Audio Processing
	- Use appropriate sample rates
	- Implement streaming where possible
	- Cache processed results
	- Optimize memory usage

	2. Model Loading
	- Load models once and reuse
	- Use lazy loading for optional providers
	- Implement model caching strategies

	3. Concurrent Processing
	- Use async/await for I/O operations
	- Implement thread-safe providers
	- Consider multiprocessing for CPU-intensive tasks

	### Memory Management

	- Clean up temporary files
	- Release model resources when not needed
	- Monitor memory usage in long-running processes
	- Implement resource pooling for expensive operations

	### Monitoring and Metrics

	- Track processing times
	- Monitor error rates
	- Measure resource utilization
	- Implement health checks

	## Contributing Guidelines

	### Development Workflow

	1. Fork the repository
	2. Create a feature branch
	3. Implement changes with tests
	4. Run the full test suite
	5. Submit a pull request

	### Code Review Process

	- All changes require code review
	- Tests must pass before merging
	- Documentation must be updated
	- Performance impact should be assessed

	### Release Process

	- Follow semantic versioning
	- Update changelog
	- Tag releases appropriately
	- Deploy to staging before production

	---

	For questions or support, please refer to the project documentation or open an issue in the repository.