--- title: TeachingAssistant emoji: 🚀 colorFrom: gray colorTo: blue sdk: streamlit sdk_version: 1.44.1 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # Speech Recognition Module Refactoring ## Overview The speech recognition module (`utils/stt.py`) has been refactored to support multiple ASR (Automatic Speech Recognition) models. The implementation now follows a factory pattern that allows easy switching between different speech recognition models while maintaining a consistent interface. ## Supported Models ### 1. Whisper (Default) - Based on OpenAI's Whisper Large-v3 model - High accuracy for general speech recognition - No additional installation required ### 2. Parakeet - NVIDIA's Parakeet-TDT-0.6B model - Optimized for real-time transcription - Requires additional installation (see below) ## Installation ### For Parakeet Support To use the Parakeet model, you need to install the NeMo Toolkit: ```bash pip install -U 'nemo_toolkit[asr]' ``` Alternatively, you can use the provided requirements file: ```bash pip install -r requirements-parakeet.txt ``` ## Usage ### In the Web Application The web application now includes a dropdown menu to select the ASR model. Simply choose your preferred model before uploading an audio file. ### Programmatic Usage ```python from utils.stt import transcribe_audio # Using the default Whisper model text = transcribe_audio("path/to/audio.wav") # Using the Parakeet model text = transcribe_audio("path/to/audio.wav", model_name="parakeet") ``` ### Direct Model Access For more advanced usage, you can directly access the model classes: ```python from utils.stt import ASRFactory # Get a specific model instance whisper_model = ASRFactory.get_model("whisper") parakeet_model = ASRFactory.get_model("parakeet") # Use the model directly text = whisper_model.transcribe("path/to/audio.wav") ``` ## Architecture The refactored code follows these design patterns: 1. **Abstract Base Class**: `ASRModel` defines the interface for all speech recognition models 2. **Factory Pattern**: `ASRFactory` creates the appropriate model instance based on the requested model name 3. **Strategy Pattern**: Different model implementations can be swapped at runtime This architecture makes it easy to add support for additional ASR models in the future.