Spaces:
Sleeping
Sleeping
| title: TeachingAssistant | |
| emoji: π | |
| colorFrom: gray | |
| colorTo: blue | |
| sdk: streamlit | |
| sdk_version: 1.44.1 | |
| app_file: app.py | |
| pinned: false | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # Speech Recognition Module Refactoring | |
| ## Overview | |
| The speech recognition module (`utils/stt.py`) has been refactored to support multiple ASR (Automatic Speech Recognition) models. The implementation now follows a factory pattern that allows easy switching between different speech recognition models while maintaining a consistent interface. | |
| ## Supported Models | |
| ### 1. Whisper (Default) | |
| - Based on OpenAI's Whisper Large-v3 model | |
| - High accuracy for general speech recognition | |
| - No additional installation required | |
| ### 2. Parakeet | |
| - NVIDIA's Parakeet-TDT-0.6B model | |
| - Optimized for real-time transcription | |
| - Requires additional installation (see below) | |
| ## Installation | |
| ### For Parakeet Support | |
| To use the Parakeet model, you need to install the NeMo Toolkit: | |
| ```bash | |
| pip install -U 'nemo_toolkit[asr]' | |
| ``` | |
| Alternatively, you can use the provided requirements file: | |
| ```bash | |
| pip install -r requirements-parakeet.txt | |
| ``` | |
| ## Usage | |
| ### In the Web Application | |
| The web application now includes a dropdown menu to select the ASR model. Simply choose your preferred model before uploading an audio file. | |
| ### Programmatic Usage | |
| ```python | |
| from utils.stt import transcribe_audio | |
| # Using the default Whisper model | |
| text = transcribe_audio("path/to/audio.wav") | |
| # Using the Parakeet model | |
| text = transcribe_audio("path/to/audio.wav", model_name="parakeet") | |
| ``` | |
| ### Direct Model Access | |
| For more advanced usage, you can directly access the model classes: | |
| ```python | |
| from utils.stt import ASRFactory | |
| # Get a specific model instance | |
| whisper_model = ASRFactory.get_model("whisper") | |
| parakeet_model = ASRFactory.get_model("parakeet") | |
| # Use the model directly | |
| text = whisper_model.transcribe("path/to/audio.wav") | |
| ``` | |
| ## Architecture | |
| The refactored code follows these design patterns: | |
| 1. **Abstract Base Class**: `ASRModel` defines the interface for all speech recognition models | |
| 2. **Factory Pattern**: `ASRFactory` creates the appropriate model instance based on the requested model name | |
| 3. **Strategy Pattern**: Different model implementations can be swapped at runtime | |
| This architecture makes it easy to add support for additional ASR models in the future. |