---
title: TeachingAssistant
emoji: 🚀
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
---

Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference>

# Audio Translation System

A high-quality audio translation system built using Domain-Driven Design (DDD) principles. The application processes audio through a pipeline: Speech-to-Text (STT) → Translation → Text-to-Speech (TTS), supporting multiple providers for each service with automatic fallback mechanisms.

## 🏗️ Architecture Overview

The application follows a clean DDD architecture with clear separation of concerns:

```
src/
├── domain/                    # 🧠 Business logic and rules
│   ├── models/               # Domain entities and value objects
│   ├── services/             # Domain services
│   ├── interfaces/           # Domain interfaces (ports)
│   └── exceptions.py         # Domain-specific exceptions
├── application/              # 🎯 Use case orchestration
│   ├── services/             # Application services
│   ├── dtos/                 # Data transfer objects
│   └── error_handling/       # Application error handling
├── infrastructure/           # 🔧 External concerns
│   ├── tts/                  # TTS provider implementations
│   ├── stt/                  # STT provider implementations
│   ├── translation/          # Translation service implementations
│   ├── base/                 # Provider base classes
│   └── config/               # Configuration and DI container
└── presentation/             # 🖥️ UI layer
    └── (Streamlit app in app.py)
```

### 🔄 Data Flow

```mermaid
graph TD
    A[User Upload] --> B[Presentation Layer]
    B --> C[Application Service]
    C --> D[Domain Service]
    D --> E[STT Provider]
    D --> F[Translation Provider]
    D --> G[TTS Provider]
    E --> H[Infrastructure Layer]
    F --> H
    G --> H
    H --> I[External Services]
```

## 🚀 Quick Start

### Prerequisites

- Python 3.9+
- FFmpeg (for audio processing)
- Optional: CUDA for GPU acceleration

### Installation

1. **Clone the repository:**
   ```bash
   git clone <repository-url>
   cd audio-translation-system
   ```

2. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

3. **Run the application:**
   ```bash
   streamlit run app.py
   ```

4. **Access the web interface:**
   Open your browser to `http://localhost:8501`

## 🎛️ Supported Providers

### Speech-to-Text (STT)
- **Whisper** (Default) - OpenAI's Whisper Large-v3 model
- **Parakeet** - NVIDIA's Parakeet-TDT-0.6B model (requires NeMo Toolkit)

### Translation
- **NLLB** - Meta's No Language Left Behind model

### Text-to-Speech (TTS)
- **Kokoro** - High-quality neural TTS
- **Dia** - Fast neural TTS
- **CosyVoice2** - Advanced voice synthesis
- **Dummy** - Test provider for development

## 📖 Usage

### Web Interface

1. **Upload Audio**: Support for WAV, MP3, FLAC, OGG formats
2. **Select Model**: Choose STT model (Whisper/Parakeet)
3. **Choose Language**: Select target translation language
4. **Pick Voice**: Select TTS voice and speed
5. **Process**: Click to start the translation pipeline

### Programmatic Usage

```python
from src.infrastructure.config.container_setup import initialize_global_container
from src.application.services.audio_processing_service import AudioProcessingApplicationService
from src.application.dtos.processing_request_dto import ProcessingRequestDto
from src.application.dtos.audio_upload_dto import AudioUploadDto

# Initialize dependency container
container = initialize_global_container()
audio_service = container.resolve(AudioProcessingApplicationService)

# Create request
with open("audio.wav", "rb") as f:
    audio_upload = AudioUploadDto(
        filename="audio.wav",
        content=f.read(),
        content_type="audio/wav",
        size=os.path.getsize("audio.wav")
    )

request = ProcessingRequestDto(
    audio=audio_upload,
    asr_model="whisper-small",
    target_language="zh",
    voice="kokoro",
    speed=1.0
)

# Process audio
result = audio_service.process_audio_pipeline(request)

if result.success:
    print(f"Original: {result.original_text}")
    print(f"Translated: {result.translated_text}")
    print(f"Audio saved to: {result.audio_path}")
else:
    print(f"Error: {result.error_message}")
```

## 🧪 Testing

The project includes comprehensive test coverage:

```bash
# Run all tests
python -m pytest

# Run specific test categories
python -m pytest tests/unit/          # Unit tests
python -m pytest tests/integration/   # Integration tests

# Run with coverage
python -m pytest --cov=src --cov-report=html
```

### Test Structure
- **Unit Tests**: Test individual components in isolation
- **Integration Tests**: Test provider integrations and complete pipeline
- **Mocking**: Uses dependency injection for easy mocking

## 🔧 Configuration

### Environment Variables

Create a `.env` file or set environment variables:

```bash
# Provider preferences (comma-separated, in order of preference)
TTS_PROVIDERS=kokoro,dia,cosyvoice2,dummy
STT_PROVIDERS=whisper,parakeet
TRANSLATION_PROVIDERS=nllb

# Logging
LOG_LEVEL=INFO
LOG_FILE=app.log

# Performance
MAX_FILE_SIZE_MB=100
TEMP_FILE_CLEANUP_HOURS=24
```

### Provider Configuration

The system automatically detects available providers and falls back gracefully:

```python
# Example: Custom provider configuration
from src.infrastructure.config.dependency_container import DependencyContainer

container = DependencyContainer()
container.configure_tts_providers(['kokoro', 'dummy'])  # Preferred order
```

## 🏗️ Architecture Benefits

### 🎯 Domain-Driven Design
- **Clear Business Logic**: Domain layer contains pure business rules
- **Separation of Concerns**: Each layer has distinct responsibilities
- **Testability**: Easy to test business logic independently

### 🔌 Dependency Injection
- **Loose Coupling**: Components depend on abstractions, not implementations
- **Easy Testing**: Mock dependencies for unit testing
- **Flexibility**: Swap implementations without changing business logic

### 🛡️ Error Handling
- **Layered Exceptions**: Domain exceptions mapped to user-friendly messages
- **Graceful Fallbacks**: Automatic provider fallback on failures
- **Structured Logging**: Correlation IDs and detailed error tracking

### 📈 Extensibility
- **Plugin Architecture**: Add new providers by implementing interfaces
- **Configuration-Driven**: Change behavior through configuration
- **Provider Factories**: Automatic provider discovery and instantiation

## 🔍 Troubleshooting

### Common Issues

**Import Errors:**
```bash
# Ensure all dependencies are installed
pip install -r requirements.txt

# For Parakeet support
pip install 'nemo_toolkit[asr]'
```

**Audio Processing Errors:**
- Verify FFmpeg is installed and in PATH
- Check audio file format is supported
- Ensure sufficient disk space for temporary files

**Provider Unavailable:**
- Check provider-specific dependencies
- Review logs for detailed error messages
- Verify provider configuration

### Debug Mode

Enable detailed logging:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## 🤝 Contributing

### Adding New Providers

See [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md) for detailed instructions on:
- Implementing new TTS providers
- Adding STT models
- Extending translation services
- Writing tests

### Development Setup

1. **Install development dependencies:**
   ```bash
   pip install -r requirements-dev.txt
   ```

2. **Run pre-commit hooks:**
   ```bash
   pre-commit install
   pre-commit run --all-files
   ```

3. **Run tests before committing:**
   ```bash
   python -m pytest tests/
   ```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

For detailed developer documentation, see [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md)