Spaces:
Running
Running
title: TeachingAssistant | |
emoji: π | |
colorFrom: gray | |
colorTo: blue | |
sdk: streamlit | |
sdk_version: 1.44.1 | |
app_file: app.py | |
pinned: false | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
# Speech Recognition Module Refactoring | |
## Overview | |
The speech recognition module (`utils/stt.py`) has been refactored to support multiple ASR (Automatic Speech Recognition) models. The implementation now follows a factory pattern that allows easy switching between different speech recognition models while maintaining a consistent interface. | |
## Supported Models | |
### 1. Whisper (Default) | |
- Based on OpenAI's Whisper Large-v3 model | |
- High accuracy for general speech recognition | |
- No additional installation required | |
### 2. Parakeet | |
- NVIDIA's Parakeet-TDT-0.6B model | |
- Optimized for real-time transcription | |
- Requires additional installation (see below) | |
## Installation | |
### For Parakeet Support | |
To use the Parakeet model, you need to install the NeMo Toolkit: | |
```bash | |
pip install -U 'nemo_toolkit[asr]' | |
``` | |
Alternatively, you can use the provided requirements file: | |
```bash | |
pip install -r requirements-parakeet.txt | |
``` | |
## Usage | |
### In the Web Application | |
The web application now includes a dropdown menu to select the ASR model. Simply choose your preferred model before uploading an audio file. | |
### Programmatic Usage | |
```python | |
from utils.stt import transcribe_audio | |
# Using the default Whisper model | |
text = transcribe_audio("path/to/audio.wav") | |
# Using the Parakeet model | |
text = transcribe_audio("path/to/audio.wav", model_name="parakeet") | |
``` | |
### Direct Model Access | |
For more advanced usage, you can directly access the model classes: | |
```python | |
from utils.stt import ASRFactory | |
# Get a specific model instance | |
whisper_model = ASRFactory.get_model("whisper") | |
parakeet_model = ASRFactory.get_model("parakeet") | |
# Use the model directly | |
text = whisper_model.transcribe("path/to/audio.wav") | |
``` | |
## Architecture | |
The refactored code follows these design patterns: | |
1. **Abstract Base Class**: `ASRModel` defines the interface for all speech recognition models | |
2. **Factory Pattern**: `ASRFactory` creates the appropriate model instance based on the requested model name | |
3. **Strategy Pattern**: Different model implementations can be swapped at runtime | |
This architecture makes it easy to add support for additional ASR models in the future. |