Spaces:

DroolingPanda
/

teachingAssistant

Sleeping

App Files Files Community

teachingAssistant / README.md

Michael Hu

add missing huggingface config back in readme

f75389c 4 months ago

preview code

raw

history blame

8.12 kB

	---
	title: TeachingAssistant
	emoji: 🚀
	colorFrom: gray
	colorTo: blue
	sdk: streamlit
	sdk_version: 1.44.1
	app_file: app.py
	pinned: false
	---

	Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference>

	# Audio Translation System

	A high-quality audio translation system built using Domain-Driven Design (DDD) principles. The application processes audio through a pipeline: Speech-to-Text (STT) → Translation → Text-to-Speech (TTS), supporting multiple providers for each service with automatic fallback mechanisms.

	## 🏗️ Architecture Overview

	The application follows a clean DDD architecture with clear separation of concerns:

	```
	src/
	├── domain/ # 🧠 Business logic and rules
	│ ├── models/ # Domain entities and value objects
	│ ├── services/ # Domain services
	│ ├── interfaces/ # Domain interfaces (ports)
	│ └── exceptions.py # Domain-specific exceptions
	├── application/ # 🎯 Use case orchestration
	│ ├── services/ # Application services
	│ ├── dtos/ # Data transfer objects
	│ └── error_handling/ # Application error handling
	├── infrastructure/ # 🔧 External concerns
	│ ├── tts/ # TTS provider implementations
	│ ├── stt/ # STT provider implementations
	│ ├── translation/ # Translation service implementations
	│ ├── base/ # Provider base classes
	│ └── config/ # Configuration and DI container
	└── presentation/ # 🖥️ UI layer
	└── (Streamlit app in app.py)
	```

	### 🔄 Data Flow

	```mermaid
	graph TD
	A[User Upload] --> B[Presentation Layer]
	B --> C[Application Service]
	C --> D[Domain Service]
	D --> E[STT Provider]
	D --> F[Translation Provider]
	D --> G[TTS Provider]
	E --> H[Infrastructure Layer]
	F --> H
	G --> H
	H --> I[External Services]
	```

	## 🚀 Quick Start

	### Prerequisites

	- Python 3.9+
	- FFmpeg (for audio processing)
	- Optional: CUDA for GPU acceleration

	### Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd audio-translation-system
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Run the application:
	```bash
	streamlit run app.py
	```

	4. Access the web interface:
	Open your browser to `http://localhost:8501`

	## 🎛️ Supported Providers

	### Speech-to-Text (STT)
	- Whisper (Default) - OpenAI's Whisper Large-v3 model
	- Parakeet - NVIDIA's Parakeet-TDT-0.6B model (requires NeMo Toolkit)

	### Translation
	- NLLB - Meta's No Language Left Behind model

	### Text-to-Speech (TTS)
	- Kokoro - High-quality neural TTS
	- Dia - Fast neural TTS
	- CosyVoice2 - Advanced voice synthesis
	- Dummy - Test provider for development

	## 📖 Usage

	### Web Interface

	1. Upload Audio: Support for WAV, MP3, FLAC, OGG formats
	2. Select Model: Choose STT model (Whisper/Parakeet)
	3. Choose Language: Select target translation language
	4. Pick Voice: Select TTS voice and speed
	5. Process: Click to start the translation pipeline

	### Programmatic Usage

	```python
	from src.infrastructure.config.container_setup import initialize_global_container
	from src.application.services.audio_processing_service import AudioProcessingApplicationService
	from src.application.dtos.processing_request_dto import ProcessingRequestDto
	from src.application.dtos.audio_upload_dto import AudioUploadDto

	# Initialize dependency container
	container = initialize_global_container()
	audio_service = container.resolve(AudioProcessingApplicationService)

	# Create request
	with open("audio.wav", "rb") as f:
	audio_upload = AudioUploadDto(
	filename="audio.wav",
	content=f.read(),
	content_type="audio/wav",
	size=os.path.getsize("audio.wav")
	)

	request = ProcessingRequestDto(
	audio=audio_upload,
	asr_model="whisper-small",
	target_language="zh",
	voice="kokoro",
	speed=1.0
	)

	# Process audio
	result = audio_service.process_audio_pipeline(request)

	if result.success:
	print(f"Original: {result.original_text}")
	print(f"Translated: {result.translated_text}")
	print(f"Audio saved to: {result.audio_path}")
	else:
	print(f"Error: {result.error_message}")
	```

	## 🧪 Testing

	The project includes comprehensive test coverage:

	```bash
	# Run all tests
	python -m pytest

	# Run specific test categories
	python -m pytest tests/unit/ # Unit tests
	python -m pytest tests/integration/ # Integration tests

	# Run with coverage
	python -m pytest --cov=src --cov-report=html
	```

	### Test Structure
	- Unit Tests: Test individual components in isolation
	- Integration Tests: Test provider integrations and complete pipeline
	- Mocking: Uses dependency injection for easy mocking

	## 🔧 Configuration

	### Environment Variables

	Create a `.env` file or set environment variables:

	```bash
	# Provider preferences (comma-separated, in order of preference)
	TTS_PROVIDERS=kokoro,dia,cosyvoice2,dummy
	STT_PROVIDERS=whisper,parakeet
	TRANSLATION_PROVIDERS=nllb

	# Logging
	LOG_LEVEL=INFO
	LOG_FILE=app.log

	# Performance
	MAX_FILE_SIZE_MB=100
	TEMP_FILE_CLEANUP_HOURS=24
	```

	### Provider Configuration

	The system automatically detects available providers and falls back gracefully:

	```python
	# Example: Custom provider configuration
	from src.infrastructure.config.dependency_container import DependencyContainer

	container = DependencyContainer()
	container.configure_tts_providers(['kokoro', 'dummy']) # Preferred order
	```

	## 🏗️ Architecture Benefits

	### 🎯 Domain-Driven Design
	- Clear Business Logic: Domain layer contains pure business rules
	- Separation of Concerns: Each layer has distinct responsibilities
	- Testability: Easy to test business logic independently

	### 🔌 Dependency Injection
	- Loose Coupling: Components depend on abstractions, not implementations
	- Easy Testing: Mock dependencies for unit testing
	- Flexibility: Swap implementations without changing business logic

	### 🛡️ Error Handling
	- Layered Exceptions: Domain exceptions mapped to user-friendly messages
	- Graceful Fallbacks: Automatic provider fallback on failures
	- Structured Logging: Correlation IDs and detailed error tracking

	### 📈 Extensibility
	- Plugin Architecture: Add new providers by implementing interfaces
	- Configuration-Driven: Change behavior through configuration
	- Provider Factories: Automatic provider discovery and instantiation

	## 🔍 Troubleshooting

	### Common Issues

	Import Errors:
	```bash
	# Ensure all dependencies are installed
	pip install -r requirements.txt

	# For Parakeet support
	pip install 'nemo_toolkit[asr]'
	```

	Audio Processing Errors:
	- Verify FFmpeg is installed and in PATH
	- Check audio file format is supported
	- Ensure sufficient disk space for temporary files

	Provider Unavailable:
	- Check provider-specific dependencies
	- Review logs for detailed error messages
	- Verify provider configuration

	### Debug Mode

	Enable detailed logging:
	```python
	import logging
	logging.basicConfig(level=logging.DEBUG)
	```

	## 🤝 Contributing

	### Adding New Providers

	See [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md) for detailed instructions on:
	- Implementing new TTS providers
	- Adding STT models
	- Extending translation services
	- Writing tests

	### Development Setup

	1. Install development dependencies:
	```bash
	pip install -r requirements-dev.txt
	```

	2. Run pre-commit hooks:
	```bash
	pre-commit install
	pre-commit run --all-files
	```

	3. Run tests before committing:
	```bash
	python -m pytest tests/
	```

	## 📄 License

	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	---

	For detailed developer documentation, see [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md)