Spaces:

DroolingPanda
/

teachingAssistant

Sleeping

App Files Files Community

teachingAssistant / README.md

Michael Hu

refactor tts

60bd17d 6 months ago

preview code

raw

history blame

2.4 kB

	---
	title: TeachingAssistant
	emoji: 🚀
	colorFrom: gray
	colorTo: blue
	sdk: streamlit
	sdk_version: 1.44.1
	app_file: app.py
	pinned: false
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# Speech Recognition Module Refactoring

	## Overview

	The speech recognition module (`utils/stt.py`) has been refactored to support multiple ASR (Automatic Speech Recognition) models. The implementation now follows a factory pattern that allows easy switching between different speech recognition models while maintaining a consistent interface.

	## Supported Models

	### 1. Whisper (Default)
	- Based on OpenAI's Whisper Large-v3 model
	- High accuracy for general speech recognition
	- No additional installation required

	### 2. Parakeet
	- NVIDIA's Parakeet-TDT-0.6B model
	- Optimized for real-time transcription
	- Requires additional installation (see below)

	## Installation

	### For Parakeet Support

	To use the Parakeet model, you need to install the NeMo Toolkit:

	```bash
	pip install -U 'nemo_toolkit[asr]'
	```

	Alternatively, you can use the provided requirements file:

	```bash
	pip install -r requirements-parakeet.txt
	```

	## Usage

	### In the Web Application

	The web application now includes a dropdown menu to select the ASR model. Simply choose your preferred model before uploading an audio file.

	### Programmatic Usage

	```python
	from utils.stt import transcribe_audio

	# Using the default Whisper model
	text = transcribe_audio("path/to/audio.wav")

	# Using the Parakeet model
	text = transcribe_audio("path/to/audio.wav", model_name="parakeet")
	```

	### Direct Model Access

	For more advanced usage, you can directly access the model classes:

	```python
	from utils.stt import ASRFactory

	# Get a specific model instance
	whisper_model = ASRFactory.get_model("whisper")
	parakeet_model = ASRFactory.get_model("parakeet")

	# Use the model directly
	text = whisper_model.transcribe("path/to/audio.wav")
	```

	## Architecture

	The refactored code follows these design patterns:

	1. Abstract Base Class: `ASRModel` defines the interface for all speech recognition models
	2. Factory Pattern: `ASRFactory` creates the appropriate model instance based on the requested model name
	3. Strategy Pattern: Different model implementations can be swapped at runtime

	This architecture makes it easy to add support for additional ASR models in the future.