Streaming-Zipformer / README.md
Luigi's picture
update readme
cd954ca
|
raw
history blame
2.76 kB
metadata
title: Streaming Zipformer
emoji: πŸ‘€
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
short_description: Streaming zipformer

πŸŽ™οΈ Real-Time Streaming ASR Demo (FastAPI + Sherpa-ONNX)

This project demonstrates a real-time speech-to-text (ASR) web application using:

  • 🧠 Sherpa-ONNX streaming Zipformer model
  • πŸš€ FastAPI backend with WebSocket support
  • πŸ§‘β€πŸ’» Hugging Face Spaces (Docker CPU-only deployment)
  • 🌐 Browser-based microphone input + UI in vanilla HTML/JS

πŸ“¦ Model

This app uses the following bilingual (Chinese-English) streaming model:

πŸ”— Model Source: Zipformer Small Bilingual zh-en (2023-02-16)

Model files (ONNX) are located under:

models/zipformer_bilingual/

πŸš€ Features

  • 🎀 Real-time microphone input (captured in browser)
  • πŸ” WebSocket-based streaming inference
  • πŸ’¬ Partial + final transcription
  • 🌏 Automatic conversion to Traditional Chinese using OpenCC
  • πŸ“Š Real-time volume indicator
  • ☁️ Deployed on Hugging Face Spaces (CPU only)

πŸ§ͺ Local Development

1. Install dependencies

pip install -r requirements.txt

2. Run the app locally

uvicorn app.main:app --reload --host 0.0.0.0 --port 8501

Then open: http://localhost:8501


🐳 Deploy on Hugging Face Spaces

This repo includes a Dockerfile compatible with HF Spaces. It uses:

  • uvicorn for serving the FastAPI app
  • opencc-python-reimplemented for Simplified β†’ Traditional Chinese
  • pysoxr or scipy for audio resampling (48kHz β†’ 16kHz)

πŸ“ Project Structure

.
β”œβ”€β”€ app
β”‚   β”œβ”€β”€ main.py               # FastAPI + WebSocket
β”‚   β”œβ”€β”€ asr_worker.py         # Sherpa inference + resampling + OpenCC
β”‚   └── static/index.html     # Client-side mic UI
β”œβ”€β”€ models/zipformer_bilingual/
β”‚   └── ... (onnx, tokens.txt)
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Dockerfile
└── README.md

πŸ”§ Credits


πŸ—£ Languages Supported

  • πŸ‡¨πŸ‡³ Chinese (Simplified input, converted to Traditional)
  • πŸ‡ΊπŸ‡Έ English

🀝 Contributing

PRs welcome! Feel free to fork this and adapt to other models or languages.


πŸ“œ License

Apache 2.0