Update README.md
Browse files
README.md
CHANGED
@@ -1,26 +1,20 @@
|
|
1 |
-
# Whisper Streaming with FastAPI
|
2 |
|
3 |
-
|
4 |
|
5 |
-
|
6 |
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
5. **MLX Whisper backend**: Integrates the alternative backend option MLX Whisper, optimized for efficient speech recognition on Apple silicon.
|
14 |
-
|
15 |
-
6. **Diarization (beta)**: Adds speaker labeling in real-time alongside transcription using the [Diart](https://github.com/juanmc2005/diart) library. Each transcription segment is tagged with a speaker.
|
16 |
-
|
17 |
-

|
18 |
-
|
19 |
-
## Code Origins
|
20 |
-
|
21 |
-
This project reuses and extends code from the original Whisper Streaming repository:
|
22 |
-
- whisper_online.py, backends.py and online_asr.py: Contains code from whisper_streaming
|
23 |
-
- silero_vad_iterator.py: Originally from the Silero VAD repository, included in the whisper_streaming project.
|
24 |
|
25 |
## Installation
|
26 |
|
@@ -81,6 +75,7 @@ This project reuses and extends code from the original Whisper Streaming reposit
|
|
81 |
- `--host` and `--port` let you specify the server’s IP/port.
|
82 |
- `-min-chunk-size` sets the minimum chunk size for audio processing. Make sure this value aligns with the chunk size selected in the frontend. If not aligned, the system will work but may unnecessarily over-process audio data.
|
83 |
- For a full list of configurable options, run `python whisper_fastapi_online_server.py -h`
|
|
|
84 |
- `--diarization`, default to False, let you choose whether or not you want to run diarization in parallel
|
85 |
- For other parameters, look at [whisper streaming](https://github.com/ufal/whisper_streaming) readme.
|
86 |
|
|
|
1 |
+
# Whisper Streaming with FastAPI & WebSocket Integration
|
2 |
|
3 |
+
A feature-packed fork of [Whisper Streaming](https://github.com/ufal/whisper_streaming) with **real-time speech-to-text (STT) enhancements**, multi-user support, and a JavaScript client 🎙️✨
|
4 |
|
5 |
+
## What's New?
|
6 |
|
7 |
+
✅ **FastAPI Server with WebSocket Endpoint** – Enables real-time STT in browsers with async FFmpeg processing.
|
8 |
+
✅ **Buffering Preview** – Displays unvalidated buffer content for better streaming feedback.
|
9 |
+
✅ **Multiple Users Support** – The backend handles multiple users simultaneously without conflicts.
|
10 |
+
✅ **HTML - JavaScript Client Implementation** – A plug-and-play MediaRecorder setup for seamless client integration.
|
11 |
+
✅ **MLX Whisper Backend** – Optimized Apple Silicon support for faster local processing.
|
12 |
+
✅ **Enhanced sentence segmentation** – Improves buffer trimming and sentence boundaries in certain languages
|
13 |
+
✅ **Diarization (Beta)** – Real-time speaker labeling using [Diart](https://github.com/juanmc2005/diart).
|
14 |
|
15 |
+
<p align="center">
|
16 |
+
<img src="src/web/demo.png" alt="Demo Screenshot" width="600">
|
17 |
+
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
## Installation
|
20 |
|
|
|
75 |
- `--host` and `--port` let you specify the server’s IP/port.
|
76 |
- `-min-chunk-size` sets the minimum chunk size for audio processing. Make sure this value aligns with the chunk size selected in the frontend. If not aligned, the system will work but may unnecessarily over-process audio data.
|
77 |
- For a full list of configurable options, run `python whisper_fastapi_online_server.py -h`
|
78 |
+
- `--transcription`, default to True. Change to False if you want to run only diarization
|
79 |
- `--diarization`, default to False, let you choose whether or not you want to run diarization in parallel
|
80 |
- For other parameters, look at [whisper streaming](https://github.com/ufal/whisper_streaming) readme.
|
81 |
|