Luigi commited on
Commit
cd954ca
Β·
1 Parent(s): 231cd3a

update readme

Browse files
Files changed (1) hide show
  1. README.md +107 -1
README.md CHANGED
@@ -9,4 +9,110 @@ license: mit
9
  short_description: Streaming zipformer
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  short_description: Streaming zipformer
10
  ---
11
 
12
+ # πŸŽ™οΈ Real-Time Streaming ASR Demo (FastAPI + Sherpa-ONNX)
13
+
14
+ This project demonstrates a real-time speech-to-text (ASR) web application using:
15
+
16
+ * 🧠 [Sherpa-ONNX](https://github.com/k2-fsa/sherpa-onnx) streaming Zipformer model
17
+ * πŸš€ FastAPI backend with WebSocket support
18
+ * πŸ§‘β€πŸ’» Hugging Face Spaces (Docker CPU-only deployment)
19
+ * 🌐 Browser-based microphone input + UI in vanilla HTML/JS
20
+
21
+ ---
22
+
23
+ ## πŸ“¦ Model
24
+
25
+ This app uses the following **bilingual (Chinese-English)** streaming model:
26
+
27
+ **πŸ”— Model Source:**
28
+ [Zipformer Small Bilingual zh-en (2023-02-16)](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16-bilingual-chinese-english)
29
+
30
+ Model files (ONNX) are located under:
31
+
32
+ ```
33
+ models/zipformer_bilingual/
34
+ ```
35
+
36
+ ---
37
+
38
+ ## πŸš€ Features
39
+
40
+ * 🎀 Real-time microphone input (captured in browser)
41
+ * πŸ” WebSocket-based streaming inference
42
+ * πŸ’¬ Partial + final transcription
43
+ * 🌏 Automatic conversion to **Traditional Chinese** using OpenCC
44
+ * πŸ“Š Real-time volume indicator
45
+ * ☁️ Deployed on Hugging Face Spaces (CPU only)
46
+
47
+ ---
48
+
49
+ ## πŸ§ͺ Local Development
50
+
51
+ ### 1. Install dependencies
52
+
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ ### 2. Run the app locally
58
+
59
+ ```bash
60
+ uvicorn app.main:app --reload --host 0.0.0.0 --port 8501
61
+ ```
62
+
63
+ Then open: [http://localhost:8501](http://localhost:8501)
64
+
65
+ ---
66
+
67
+ ## 🐳 Deploy on Hugging Face Spaces
68
+
69
+ This repo includes a `Dockerfile` compatible with HF Spaces. It uses:
70
+
71
+ * `uvicorn` for serving the FastAPI app
72
+ * `opencc-python-reimplemented` for Simplified β†’ Traditional Chinese
73
+ * `pysoxr` or `scipy` for audio resampling (48kHz β†’ 16kHz)
74
+
75
+ ---
76
+
77
+ ## πŸ“ Project Structure
78
+
79
+ ```
80
+ .
81
+ β”œβ”€β”€ app
82
+ β”‚ β”œβ”€β”€ main.py # FastAPI + WebSocket
83
+ β”‚ β”œβ”€β”€ asr_worker.py # Sherpa inference + resampling + OpenCC
84
+ β”‚ └── static/index.html # Client-side mic UI
85
+ β”œβ”€β”€ models/zipformer_bilingual/
86
+ β”‚ └── ... (onnx, tokens.txt)
87
+ β”œβ”€β”€ requirements.txt
88
+ β”œβ”€β”€ Dockerfile
89
+ └── README.md
90
+ ```
91
+
92
+ ---
93
+
94
+ ## πŸ”§ Credits
95
+
96
+ * [Sherpa-ONNX](https://github.com/k2-fsa/sherpa-onnx)
97
+ * [OpenCC](https://github.com/BYVoid/OpenCC)
98
+ * [FastAPI](https://fastapi.tiangolo.com/)
99
+ * [Hugging Face Spaces](https://huggingface.co/docs/hub/spaces)
100
+
101
+ ---
102
+
103
+ ## πŸ—£ Languages Supported
104
+
105
+ * πŸ‡¨πŸ‡³ Chinese (Simplified input, converted to Traditional)
106
+ * πŸ‡ΊπŸ‡Έ English
107
+
108
+ ---
109
+
110
+ ## 🀝 Contributing
111
+
112
+ PRs welcome! Feel free to fork this and adapt to other models or languages.
113
+
114
+ ---
115
+
116
+ ## πŸ“œ License
117
+
118
+ Apache 2.0