--- title: TeachingAssistant emoji: ๐Ÿš€ colorFrom: gray colorTo: blue sdk: streamlit sdk_version: 1.44.1 app_file: app.py pinned: false --- Check out the configuration reference at # Audio Translation System A high-quality audio translation system built using Domain-Driven Design (DDD) principles. The application processes audio through a pipeline: Speech-to-Text (STT) โ†’ Translation โ†’ Text-to-Speech (TTS), supporting multiple providers for each service with automatic fallback mechanisms. ## ๐Ÿ—๏ธ Architecture Overview The application follows a clean DDD architecture with clear separation of concerns: ``` src/ โ”œโ”€โ”€ domain/ # ๐Ÿง  Business logic and rules โ”‚ โ”œโ”€โ”€ models/ # Domain entities and value objects โ”‚ โ”œโ”€โ”€ services/ # Domain services โ”‚ โ”œโ”€โ”€ interfaces/ # Domain interfaces (ports) โ”‚ โ””โ”€โ”€ exceptions.py # Domain-specific exceptions โ”œโ”€โ”€ application/ # ๐ŸŽฏ Use case orchestration โ”‚ โ”œโ”€โ”€ services/ # Application services โ”‚ โ”œโ”€โ”€ dtos/ # Data transfer objects โ”‚ โ””โ”€โ”€ error_handling/ # Application error handling โ”œโ”€โ”€ infrastructure/ # ๐Ÿ”ง External concerns โ”‚ โ”œโ”€โ”€ tts/ # TTS provider implementations โ”‚ โ”œโ”€โ”€ stt/ # STT provider implementations โ”‚ โ”œโ”€โ”€ translation/ # Translation service implementations โ”‚ โ”œโ”€โ”€ base/ # Provider base classes โ”‚ โ””โ”€โ”€ config/ # Configuration and DI container โ””โ”€โ”€ presentation/ # ๐Ÿ–ฅ๏ธ UI layer โ””โ”€โ”€ (Streamlit app in app.py) ``` ### ๐Ÿ”„ Data Flow ```mermaid graph TD A[User Upload] --> B[Presentation Layer] B --> C[Application Service] C --> D[Domain Service] D --> E[STT Provider] D --> F[Translation Provider] D --> G[TTS Provider] E --> H[Infrastructure Layer] F --> H G --> H H --> I[External Services] ``` ## ๐Ÿš€ Quick Start ### Prerequisites - Python 3.9+ - FFmpeg (for audio processing) - Optional: CUDA for GPU acceleration ### Installation 1. **Clone the repository:** ```bash git clone cd audio-translation-system ``` 2. **Install dependencies:** ```bash pip install -r requirements.txt ``` 3. **Run the application:** ```bash streamlit run app.py ``` 4. **Access the web interface:** Open your browser to `http://localhost:8501` ## ๐ŸŽ›๏ธ Supported Providers ### Speech-to-Text (STT) - **Whisper** (Default) - OpenAI's Whisper Large-v3 model - **Parakeet** - NVIDIA's Parakeet-TDT-0.6B model (requires NeMo Toolkit) ### Translation - **NLLB** - Meta's No Language Left Behind model ### Text-to-Speech (TTS) - **Kokoro** - High-quality neural TTS - **Dia** - Fast neural TTS - **CosyVoice2** - Advanced voice synthesis - **Dummy** - Test provider for development ## ๐Ÿ“– Usage ### Web Interface 1. **Upload Audio**: Support for WAV, MP3, FLAC, OGG formats 2. **Select Model**: Choose STT model (Whisper/Parakeet) 3. **Choose Language**: Select target translation language 4. **Pick Voice**: Select TTS voice and speed 5. **Process**: Click to start the translation pipeline ### Programmatic Usage ```python from src.infrastructure.config.container_setup import initialize_global_container from src.application.services.audio_processing_service import AudioProcessingApplicationService from src.application.dtos.processing_request_dto import ProcessingRequestDto from src.application.dtos.audio_upload_dto import AudioUploadDto # Initialize dependency container container = initialize_global_container() audio_service = container.resolve(AudioProcessingApplicationService) # Create request with open("audio.wav", "rb") as f: audio_upload = AudioUploadDto( filename="audio.wav", content=f.read(), content_type="audio/wav", size=os.path.getsize("audio.wav") ) request = ProcessingRequestDto( audio=audio_upload, asr_model="whisper-small", target_language="zh", voice="kokoro", speed=1.0 ) # Process audio result = audio_service.process_audio_pipeline(request) if result.success: print(f"Original: {result.original_text}") print(f"Translated: {result.translated_text}") print(f"Audio saved to: {result.audio_path}") else: print(f"Error: {result.error_message}") ``` ## ๐Ÿงช Testing The project includes comprehensive test coverage: ```bash # Run all tests python -m pytest # Run specific test categories python -m pytest tests/unit/ # Unit tests python -m pytest tests/integration/ # Integration tests # Run with coverage python -m pytest --cov=src --cov-report=html ``` ### Test Structure - **Unit Tests**: Test individual components in isolation - **Integration Tests**: Test provider integrations and complete pipeline - **Mocking**: Uses dependency injection for easy mocking ## ๐Ÿ”ง Configuration ### Environment Variables Create a `.env` file or set environment variables: ```bash # Provider preferences (comma-separated, in order of preference) TTS_PROVIDERS=kokoro,dia,cosyvoice2,dummy STT_PROVIDERS=whisper,parakeet TRANSLATION_PROVIDERS=nllb # Logging LOG_LEVEL=INFO LOG_FILE=app.log # Performance MAX_FILE_SIZE_MB=100 TEMP_FILE_CLEANUP_HOURS=24 ``` ### Provider Configuration The system automatically detects available providers and falls back gracefully: ```python # Example: Custom provider configuration from src.infrastructure.config.dependency_container import DependencyContainer container = DependencyContainer() container.configure_tts_providers(['kokoro', 'dummy']) # Preferred order ``` ## ๐Ÿ—๏ธ Architecture Benefits ### ๐ŸŽฏ Domain-Driven Design - **Clear Business Logic**: Domain layer contains pure business rules - **Separation of Concerns**: Each layer has distinct responsibilities - **Testability**: Easy to test business logic independently ### ๐Ÿ”Œ Dependency Injection - **Loose Coupling**: Components depend on abstractions, not implementations - **Easy Testing**: Mock dependencies for unit testing - **Flexibility**: Swap implementations without changing business logic ### ๐Ÿ›ก๏ธ Error Handling - **Layered Exceptions**: Domain exceptions mapped to user-friendly messages - **Graceful Fallbacks**: Automatic provider fallback on failures - **Structured Logging**: Correlation IDs and detailed error tracking ### ๐Ÿ“ˆ Extensibility - **Plugin Architecture**: Add new providers by implementing interfaces - **Configuration-Driven**: Change behavior through configuration - **Provider Factories**: Automatic provider discovery and instantiation ## ๐Ÿ” Troubleshooting ### Common Issues **Import Errors:** ```bash # Ensure all dependencies are installed pip install -r requirements.txt # For Parakeet support pip install 'nemo_toolkit[asr]' ``` **Audio Processing Errors:** - Verify FFmpeg is installed and in PATH - Check audio file format is supported - Ensure sufficient disk space for temporary files **Provider Unavailable:** - Check provider-specific dependencies - Review logs for detailed error messages - Verify provider configuration ### Debug Mode Enable detailed logging: ```python import logging logging.basicConfig(level=logging.DEBUG) ``` ## ๐Ÿค Contributing ### Adding New Providers See [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md) for detailed instructions on: - Implementing new TTS providers - Adding STT models - Extending translation services - Writing tests ### Development Setup 1. **Install development dependencies:** ```bash pip install -r requirements-dev.txt ``` 2. **Run pre-commit hooks:** ```bash pre-commit install pre-commit run --all-files ``` 3. **Run tests before committing:** ```bash python -m pytest tests/ ``` ## ๐Ÿ“„ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. --- For detailed developer documentation, see [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md)