Spaces:
Sleeping
Sleeping
| title: TeachingAssistant | |
| emoji: π | |
| colorFrom: gray | |
| colorTo: blue | |
| sdk: streamlit | |
| sdk_version: 1.44.1 | |
| app_file: app.py | |
| pinned: false | |
| Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference> | |
| # Audio Translation System | |
| A high-quality audio translation system built using Domain-Driven Design (DDD) principles. The application processes audio through a pipeline: Speech-to-Text (STT) β Translation β Text-to-Speech (TTS), supporting multiple providers for each service with automatic fallback mechanisms. | |
| ## ποΈ Architecture Overview | |
| The application follows a clean DDD architecture with clear separation of concerns: | |
| ``` | |
| src/ | |
| βββ domain/ # π§ Business logic and rules | |
| β βββ models/ # Domain entities and value objects | |
| β βββ services/ # Domain services | |
| β βββ interfaces/ # Domain interfaces (ports) | |
| β βββ exceptions.py # Domain-specific exceptions | |
| βββ application/ # π― Use case orchestration | |
| β βββ services/ # Application services | |
| β βββ dtos/ # Data transfer objects | |
| β βββ error_handling/ # Application error handling | |
| βββ infrastructure/ # π§ External concerns | |
| β βββ tts/ # TTS provider implementations | |
| β βββ stt/ # STT provider implementations | |
| β βββ translation/ # Translation service implementations | |
| β βββ base/ # Provider base classes | |
| β βββ config/ # Configuration and DI container | |
| βββ presentation/ # π₯οΈ UI layer | |
| βββ (Streamlit app in app.py) | |
| ``` | |
| ### π Data Flow | |
| ```mermaid | |
| graph TD | |
| A[User Upload] --> B[Presentation Layer] | |
| B --> C[Application Service] | |
| C --> D[Domain Service] | |
| D --> E[STT Provider] | |
| D --> F[Translation Provider] | |
| D --> G[TTS Provider] | |
| E --> H[Infrastructure Layer] | |
| F --> H | |
| G --> H | |
| H --> I[External Services] | |
| ``` | |
| ## π Quick Start | |
| ### Prerequisites | |
| - Python 3.9+ | |
| - FFmpeg (for audio processing) | |
| - Optional: CUDA for GPU acceleration | |
| ### Installation | |
| 1. **Clone the repository:** | |
| ```bash | |
| git clone <repository-url> | |
| cd audio-translation-system | |
| ``` | |
| 2. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **Run the application:** | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| 4. **Access the web interface:** | |
| Open your browser to `http://localhost:8501` | |
| ## ποΈ Supported Providers | |
| ### Speech-to-Text (STT) | |
| - **Whisper** (Default) - OpenAI's Whisper Large-v3 model | |
| - **Parakeet** - NVIDIA's Parakeet-TDT-0.6B model (requires NeMo Toolkit) | |
| ### Translation | |
| - **NLLB** - Meta's No Language Left Behind model | |
| ### Text-to-Speech (TTS) | |
| - **Kokoro** - High-quality neural TTS | |
| - **Dia** - Fast neural TTS | |
| - **CosyVoice2** - Advanced voice synthesis | |
| - **Dummy** - Test provider for development | |
| ## π Usage | |
| ### Web Interface | |
| 1. **Upload Audio**: Support for WAV, MP3, FLAC, OGG formats | |
| 2. **Select Model**: Choose STT model (Whisper/Parakeet) | |
| 3. **Choose Language**: Select target translation language | |
| 4. **Pick Voice**: Select TTS voice and speed | |
| 5. **Process**: Click to start the translation pipeline | |
| ### Programmatic Usage | |
| ```python | |
| from src.infrastructure.config.container_setup import initialize_global_container | |
| from src.application.services.audio_processing_service import AudioProcessingApplicationService | |
| from src.application.dtos.processing_request_dto import ProcessingRequestDto | |
| from src.application.dtos.audio_upload_dto import AudioUploadDto | |
| # Initialize dependency container | |
| container = initialize_global_container() | |
| audio_service = container.resolve(AudioProcessingApplicationService) | |
| # Create request | |
| with open("audio.wav", "rb") as f: | |
| audio_upload = AudioUploadDto( | |
| filename="audio.wav", | |
| content=f.read(), | |
| content_type="audio/wav", | |
| size=os.path.getsize("audio.wav") | |
| ) | |
| request = ProcessingRequestDto( | |
| audio=audio_upload, | |
| asr_model="whisper-small", | |
| target_language="zh", | |
| voice="kokoro", | |
| speed=1.0 | |
| ) | |
| # Process audio | |
| result = audio_service.process_audio_pipeline(request) | |
| if result.success: | |
| print(f"Original: {result.original_text}") | |
| print(f"Translated: {result.translated_text}") | |
| print(f"Audio saved to: {result.audio_path}") | |
| else: | |
| print(f"Error: {result.error_message}") | |
| ``` | |
| ## π§ͺ Testing | |
| The project includes comprehensive test coverage: | |
| ```bash | |
| # Run all tests | |
| python -m pytest | |
| # Run specific test categories | |
| python -m pytest tests/unit/ # Unit tests | |
| python -m pytest tests/integration/ # Integration tests | |
| # Run with coverage | |
| python -m pytest --cov=src --cov-report=html | |
| ``` | |
| ### Test Structure | |
| - **Unit Tests**: Test individual components in isolation | |
| - **Integration Tests**: Test provider integrations and complete pipeline | |
| - **Mocking**: Uses dependency injection for easy mocking | |
| ## π§ Configuration | |
| ### Environment Variables | |
| Create a `.env` file or set environment variables: | |
| ```bash | |
| # Provider preferences (comma-separated, in order of preference) | |
| TTS_PROVIDERS=kokoro,dia,cosyvoice2,dummy | |
| STT_PROVIDERS=whisper,parakeet | |
| TRANSLATION_PROVIDERS=nllb | |
| # Logging | |
| LOG_LEVEL=INFO | |
| LOG_FILE=app.log | |
| # Performance | |
| MAX_FILE_SIZE_MB=100 | |
| TEMP_FILE_CLEANUP_HOURS=24 | |
| ``` | |
| ### Provider Configuration | |
| The system automatically detects available providers and falls back gracefully: | |
| ```python | |
| # Example: Custom provider configuration | |
| from src.infrastructure.config.dependency_container import DependencyContainer | |
| container = DependencyContainer() | |
| container.configure_tts_providers(['kokoro', 'dummy']) # Preferred order | |
| ``` | |
| ## ποΈ Architecture Benefits | |
| ### π― Domain-Driven Design | |
| - **Clear Business Logic**: Domain layer contains pure business rules | |
| - **Separation of Concerns**: Each layer has distinct responsibilities | |
| - **Testability**: Easy to test business logic independently | |
| ### π Dependency Injection | |
| - **Loose Coupling**: Components depend on abstractions, not implementations | |
| - **Easy Testing**: Mock dependencies for unit testing | |
| - **Flexibility**: Swap implementations without changing business logic | |
| ### π‘οΈ Error Handling | |
| - **Layered Exceptions**: Domain exceptions mapped to user-friendly messages | |
| - **Graceful Fallbacks**: Automatic provider fallback on failures | |
| - **Structured Logging**: Correlation IDs and detailed error tracking | |
| ### π Extensibility | |
| - **Plugin Architecture**: Add new providers by implementing interfaces | |
| - **Configuration-Driven**: Change behavior through configuration | |
| - **Provider Factories**: Automatic provider discovery and instantiation | |
| ## π Troubleshooting | |
| ### Common Issues | |
| **Import Errors:** | |
| ```bash | |
| # Ensure all dependencies are installed | |
| pip install -r requirements.txt | |
| # For Parakeet support | |
| pip install 'nemo_toolkit[asr]' | |
| ``` | |
| **Audio Processing Errors:** | |
| - Verify FFmpeg is installed and in PATH | |
| - Check audio file format is supported | |
| - Ensure sufficient disk space for temporary files | |
| **Provider Unavailable:** | |
| - Check provider-specific dependencies | |
| - Review logs for detailed error messages | |
| - Verify provider configuration | |
| ### Debug Mode | |
| Enable detailed logging: | |
| ```python | |
| import logging | |
| logging.basicConfig(level=logging.DEBUG) | |
| ``` | |
| ## π€ Contributing | |
| ### Adding New Providers | |
| See [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md) for detailed instructions on: | |
| - Implementing new TTS providers | |
| - Adding STT models | |
| - Extending translation services | |
| - Writing tests | |
| ### Development Setup | |
| 1. **Install development dependencies:** | |
| ```bash | |
| pip install -r requirements-dev.txt | |
| ``` | |
| 2. **Run pre-commit hooks:** | |
| ```bash | |
| pre-commit install | |
| pre-commit run --all-files | |
| ``` | |
| 3. **Run tests before committing:** | |
| ```bash | |
| python -m pytest tests/ | |
| ``` | |
| ## π License | |
| This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |
| --- | |
| For detailed developer documentation, see [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md) |