metadata

title: Streaming Zipformer
emoji: 👀
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
short_description: Streaming zipformer

🎙️ Real-Time Streaming ASR Demo (FastAPI + Sherpa-ONNX)

This project demonstrates a real-time speech-to-text (ASR) web application using:

🧠 Sherpa-ONNX streaming Zipformer model
🚀 FastAPI backend with WebSocket support
🧑‍💻 Hugging Face Spaces (Docker CPU-only deployment)
🌐 Browser-based microphone input + UI in vanilla HTML/JS

📦 Model

This app uses the following bilingual (Chinese-English) streaming model:

🔗 Model Source: Zipformer Small Bilingual zh-en (2023-02-16)

Model files (ONNX) are located under:

models/zipformer_bilingual/

🚀 Features

🎤 Real-time microphone input (captured in browser)
🔁 WebSocket-based streaming inference
💬 Partial + final transcription
🌏 Automatic conversion to Traditional Chinese using OpenCC
📊 Real-time volume indicator
☁️ Deployed on Hugging Face Spaces (CPU only)

🧪 Local Development

1. Install dependencies

pip install -r requirements.txt

2. Run the app locally

uvicorn app.main:app --reload --host 0.0.0.0 --port 8501

Then open: http://localhost:8501

🐳 Deploy on Hugging Face Spaces

This repo includes a Dockerfile compatible with HF Spaces. It uses:

uvicorn for serving the FastAPI app
opencc-python-reimplemented for Simplified → Traditional Chinese
pysoxr or scipy for audio resampling (48kHz → 16kHz)

📁 Project Structure

.
├── app
│   ├── main.py               # FastAPI + WebSocket
│   ├── asr_worker.py         # Sherpa inference + resampling + OpenCC
│   └── static/index.html     # Client-side mic UI
├── models/zipformer_bilingual/
│   └── ... (onnx, tokens.txt)
├── requirements.txt
├── Dockerfile
└── README.md

🔧 Credits

🗣 Languages Supported

🇨🇳 Chinese (Simplified input, converted to Traditional)
🇺🇸 English

🤝 Contributing

PRs welcome! Feel free to fork this and adapt to other models or languages.

📜 License

Apache 2.0