Spaces:
Running
Running
File size: 2,403 Bytes
71acd53 c549dab 724aa35 71acd53 c549dab 60bd17d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
title: TeachingAssistant
emoji: π
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Speech Recognition Module Refactoring
## Overview
The speech recognition module (`utils/stt.py`) has been refactored to support multiple ASR (Automatic Speech Recognition) models. The implementation now follows a factory pattern that allows easy switching between different speech recognition models while maintaining a consistent interface.
## Supported Models
### 1. Whisper (Default)
- Based on OpenAI's Whisper Large-v3 model
- High accuracy for general speech recognition
- No additional installation required
### 2. Parakeet
- NVIDIA's Parakeet-TDT-0.6B model
- Optimized for real-time transcription
- Requires additional installation (see below)
## Installation
### For Parakeet Support
To use the Parakeet model, you need to install the NeMo Toolkit:
```bash
pip install -U 'nemo_toolkit[asr]'
```
Alternatively, you can use the provided requirements file:
```bash
pip install -r requirements-parakeet.txt
```
## Usage
### In the Web Application
The web application now includes a dropdown menu to select the ASR model. Simply choose your preferred model before uploading an audio file.
### Programmatic Usage
```python
from utils.stt import transcribe_audio
# Using the default Whisper model
text = transcribe_audio("path/to/audio.wav")
# Using the Parakeet model
text = transcribe_audio("path/to/audio.wav", model_name="parakeet")
```
### Direct Model Access
For more advanced usage, you can directly access the model classes:
```python
from utils.stt import ASRFactory
# Get a specific model instance
whisper_model = ASRFactory.get_model("whisper")
parakeet_model = ASRFactory.get_model("parakeet")
# Use the model directly
text = whisper_model.transcribe("path/to/audio.wav")
```
## Architecture
The refactored code follows these design patterns:
1. **Abstract Base Class**: `ASRModel` defines the interface for all speech recognition models
2. **Factory Pattern**: `ASRFactory` creates the appropriate model instance based on the requested model name
3. **Strategy Pattern**: Different model implementations can be swapped at runtime
This architecture makes it easy to add support for additional ASR models in the future. |