teachingAssistant / README.md
Michael Hu
refactor tts
60bd17d

A newer version of the Streamlit SDK is available: 1.45.1

Upgrade
metadata
title: TeachingAssistant
emoji: πŸš€
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Speech Recognition Module Refactoring

Overview

The speech recognition module (utils/stt.py) has been refactored to support multiple ASR (Automatic Speech Recognition) models. The implementation now follows a factory pattern that allows easy switching between different speech recognition models while maintaining a consistent interface.

Supported Models

1. Whisper (Default)

  • Based on OpenAI's Whisper Large-v3 model
  • High accuracy for general speech recognition
  • No additional installation required

2. Parakeet

  • NVIDIA's Parakeet-TDT-0.6B model
  • Optimized for real-time transcription
  • Requires additional installation (see below)

Installation

For Parakeet Support

To use the Parakeet model, you need to install the NeMo Toolkit:

pip install -U 'nemo_toolkit[asr]'

Alternatively, you can use the provided requirements file:

pip install -r requirements-parakeet.txt

Usage

In the Web Application

The web application now includes a dropdown menu to select the ASR model. Simply choose your preferred model before uploading an audio file.

Programmatic Usage

from utils.stt import transcribe_audio

# Using the default Whisper model
text = transcribe_audio("path/to/audio.wav")

# Using the Parakeet model
text = transcribe_audio("path/to/audio.wav", model_name="parakeet")

Direct Model Access

For more advanced usage, you can directly access the model classes:

from utils.stt import ASRFactory

# Get a specific model instance
whisper_model = ASRFactory.get_model("whisper")
parakeet_model = ASRFactory.get_model("parakeet")

# Use the model directly
text = whisper_model.transcribe("path/to/audio.wav")

Architecture

The refactored code follows these design patterns:

  1. Abstract Base Class: ASRModel defines the interface for all speech recognition models
  2. Factory Pattern: ASRFactory creates the appropriate model instance based on the requested model name
  3. Strategy Pattern: Different model implementations can be swapped at runtime

This architecture makes it easy to add support for additional ASR models in the future.