Spaces:

DroolingPanda
/

teachingAssistant

Sleeping

App Files Files Community

teachingAssistant / README.md

DroolingPanda

Update README.md

c28c1de verified 5 months ago

preview code

raw

history blame contribute delete

7.99 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: TeachingAssistant
emoji: 🚀
colorFrom: gray
colorTo: blue
sdk: gradio
app_file: app.py
pinned: false
sdk_version: 5.47.2

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Audio Translation System

A high-quality audio translation system built using Domain-Driven Design (DDD) principles. The application processes audio through a pipeline: Speech-to-Text (STT) → Translation → Text-to-Speech (TTS), supporting multiple providers for each service with automatic fallback mechanisms.

🏗️ Architecture Overview

The application follows a clean DDD architecture with clear separation of concerns:

src/
├── domain/                    # 🧠 Business logic and rules
│   ├── models/               # Domain entities and value objects
│   ├── services/             # Domain services
│   ├── interfaces/           # Domain interfaces (ports)
│   └── exceptions.py         # Domain-specific exceptions
├── application/              # 🎯 Use case orchestration
│   ├── services/             # Application services
│   ├── dtos/                 # Data transfer objects
│   └── error_handling/       # Application error handling
├── infrastructure/           # 🔧 External concerns
│   ├── tts/                  # TTS provider implementations
│   ├── stt/                  # STT provider implementations
│   ├── translation/          # Translation service implementations
│   ├── base/                 # Provider base classes
│   └── config/               # Configuration and DI container
└── presentation/             # 🖥️ UI layer
    └── (Streamlit app in app.py)

🔄 Data Flow

graph TD
    A[User Upload] --> B[Presentation Layer]
    B --> C[Application Service]
    C --> D[Domain Service]
    D --> E[STT Provider]
    D --> F[Translation Provider]
    D --> G[TTS Provider]
    E --> H[Infrastructure Layer]
    F --> H
    G --> H
    H --> I[External Services]

🚀 Quick Start

Prerequisites

Python 3.9+
FFmpeg (for audio processing)
Optional: CUDA for GPU acceleration

Installation

Clone the repository:

git clone <repository-url>
cd audio-translation-system

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
streamlit run app.py
```
Access the web interface: Open your browser to http://localhost:8501

🎛️ Supported Providers

Speech-to-Text (STT)

Whisper (Default) - OpenAI's Whisper Large-v3 model
Parakeet - NVIDIA's Parakeet-TDT-0.6B model (requires NeMo Toolkit)

Translation

NLLB - Meta's No Language Left Behind model

Text-to-Speech (TTS)

Chatterbox - High-quality neural TTS provider

📖 Usage

Web Interface

Upload Audio: Support for WAV, MP3, FLAC, OGG formats
Select Model: Choose STT model (Whisper/Parakeet)
Choose Language: Select target translation language
Pick Voice: Select TTS voice and speed
Process: Click to start the translation pipeline

Programmatic Usage

from src.infrastructure.config.container_setup import initialize_global_container
from src.application.services.audio_processing_service import AudioProcessingApplicationService
from src.application.dtos.processing_request_dto import ProcessingRequestDto
from src.application.dtos.audio_upload_dto import AudioUploadDto

# Initialize dependency container
container = initialize_global_container()
audio_service = container.resolve(AudioProcessingApplicationService)

# Create request
with open("audio.wav", "rb") as f:
    audio_upload = AudioUploadDto(
        filename="audio.wav",
        content=f.read(),
        content_type="audio/wav",
        size=os.path.getsize("audio.wav")
    )

request = ProcessingRequestDto(
    audio=audio_upload,
    asr_model="whisper-small",
    target_language="zh",
    voice="chatterbox",
    speed=1.0
)

# Process audio
result = audio_service.process_audio_pipeline(request)

if result.success:
    print(f"Original: {result.original_text}")
    print(f"Translated: {result.translated_text}")
    print(f"Audio saved to: {result.audio_path}")
else:
    print(f"Error: {result.error_message}")

🧪 Testing

The project includes comprehensive test coverage:

# Run all tests
python -m pytest

# Run specific test categories
python -m pytest tests/unit/          # Unit tests
python -m pytest tests/integration/   # Integration tests

# Run with coverage
python -m pytest --cov=src --cov-report=html

Test Structure

Unit Tests: Test individual components in isolation
Integration Tests: Test provider integrations and complete pipeline
Mocking: Uses dependency injection for easy mocking

🔧 Configuration

Environment Variables

Create a .env file or set environment variables:

# Provider preferences (comma-separated, in order of preference)
TTS_PROVIDERS=chatterbox
STT_PROVIDERS=whisper,parakeet
TRANSLATION_PROVIDERS=nllb

# Logging
LOG_LEVEL=INFO
LOG_FILE=app.log

# Performance
MAX_FILE_SIZE_MB=100
TEMP_FILE_CLEANUP_HOURS=24

Provider Configuration

The system automatically detects available providers and falls back gracefully:

# Example: Custom provider configuration
from src.infrastructure.config.dependency_container import DependencyContainer

container = DependencyContainer()
container.configure_tts_providers(['chatterbox'])  # Preferred order

🏗️ Architecture Benefits

🎯 Domain-Driven Design

Clear Business Logic: Domain layer contains pure business rules
Separation of Concerns: Each layer has distinct responsibilities
Testability: Easy to test business logic independently

🔌 Dependency Injection

Loose Coupling: Components depend on abstractions, not implementations
Easy Testing: Mock dependencies for unit testing
Flexibility: Swap implementations without changing business logic

🛡️ Error Handling

Layered Exceptions: Domain exceptions mapped to user-friendly messages
Graceful Fallbacks: Automatic provider fallback on failures
Structured Logging: Correlation IDs and detailed error tracking

📈 Extensibility

Plugin Architecture: Add new providers by implementing interfaces
Configuration-Driven: Change behavior through configuration
Provider Factories: Automatic provider discovery and instantiation

🔍 Troubleshooting

Common Issues

Import Errors:

# Ensure all dependencies are installed
pip install -r requirements.txt

# For Parakeet support
pip install 'nemo_toolkit[asr]'

Audio Processing Errors:

Verify FFmpeg is installed and in PATH
Check audio file format is supported
Ensure sufficient disk space for temporary files

Provider Unavailable:

Check provider-specific dependencies
Review logs for detailed error messages
Verify provider configuration

Debug Mode

Enable detailed logging:

import logging
logging.basicConfig(level=logging.DEBUG)

🤝 Contributing

Adding New Providers

See DEVELOPER_GUIDE.md for detailed instructions on:

Implementing new TTS providers
Adding STT models
Extending translation services
Writing tests

Development Setup

Install development dependencies:
```
pip install -r requirements-dev.txt
```

Run pre-commit hooks:

pre-commit install
pre-commit run --all-files

Run tests before committing:
```
python -m pytest tests/
```

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

For detailed developer documentation, see DEVELOPER_GUIDE.md