Spaces:
Running
Running
metadata
title: Streaming Zipformer
emoji: π
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
short_description: Streaming zipformer
ποΈ Real-Time Streaming ASR Demo (FastAPI + Sherpa-ONNX)
This project demonstrates a real-time speech-to-text (ASR) web application using:
- π§ Sherpa-ONNX streaming Zipformer model
- π FastAPI backend with WebSocket support
- π§βπ» Hugging Face Spaces (Docker CPU-only deployment)
- π Browser-based microphone input + UI in vanilla HTML/JS
π¦ Model
This app uses the following bilingual (Chinese-English) streaming model:
π Model Source: Zipformer Small Bilingual zh-en (2023-02-16)
Model files (ONNX) are located under:
models/zipformer_bilingual/
π Features
- π€ Real-time microphone input (captured in browser)
- π WebSocket-based streaming inference
- π¬ Partial + final transcription
- π Automatic conversion to Traditional Chinese using OpenCC
- π Real-time volume indicator
- βοΈ Deployed on Hugging Face Spaces (CPU only)
π§ͺ Local Development
1. Install dependencies
pip install -r requirements.txt
2. Run the app locally
uvicorn app.main:app --reload --host 0.0.0.0 --port 8501
Then open: http://localhost:8501
π³ Deploy on Hugging Face Spaces
This repo includes a Dockerfile
compatible with HF Spaces. It uses:
uvicorn
for serving the FastAPI appopencc-python-reimplemented
for Simplified β Traditional Chinesepysoxr
orscipy
for audio resampling (48kHz β 16kHz)
π Project Structure
.
βββ app
β βββ main.py # FastAPI + WebSocket
β βββ asr_worker.py # Sherpa inference + resampling + OpenCC
β βββ static/index.html # Client-side mic UI
βββ models/zipformer_bilingual/
β βββ ... (onnx, tokens.txt)
βββ requirements.txt
βββ Dockerfile
βββ README.md
π§ Credits
π£ Languages Supported
- π¨π³ Chinese (Simplified input, converted to Traditional)
- πΊπΈ English
π€ Contributing
PRs welcome! Feel free to fork this and adapt to other models or languages.
π License
Apache 2.0