Update README.md
Browse files
README.md
CHANGED
@@ -1,31 +1,28 @@
|
|
1 |
# Whisper Streaming Web: Real-time Speech-to-Text with Web UI & FastAPI WebSocket
|
2 |
|
3 |
-
This fork of [Whisper Streaming](https://github.com/ufal/whisper_streaming) adds a ready-to-use HTML interface, making it
|
|
|
|
|
|
|
|
|
4 |
|
5 |
### What's New?
|
6 |
|
7 |
#### π **Web & API**
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
|
12 |
#### βοΈ **Core Improvements**
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
#### π₯ **Advanced Features**
|
20 |
-
β
**Real-Time Diarization (Beta)** β Assigns speaker labels dynamically using [Diart](https://github.com/juanmc2005/diart).
|
21 |
|
|
|
|
|
22 |
|
23 |
-
### Web UI
|
24 |
-
|
25 |
-
|
26 |
-
<p align="center">
|
27 |
-
<img src="src/web/demo.png" alt="Demo Screenshot" width="600">
|
28 |
-
</p>
|
29 |
|
30 |
## Installation
|
31 |
|
|
|
1 |
# Whisper Streaming Web: Real-time Speech-to-Text with Web UI & FastAPI WebSocket
|
2 |
|
3 |
+
This fork of [Whisper Streaming](https://github.com/ufal/whisper_streaming) adds a ready-to-use HTML interface, making it easy to start transcribing audio directly from your browser. Launch the local server, allow microphone access, and start speaking. Everything runs locally on your machine ποΈβ¨
|
4 |
+
|
5 |
+
<p align="center">
|
6 |
+
<img src="src/web/demo.png" alt="Demo Screenshot" width="600">
|
7 |
+
</p>
|
8 |
|
9 |
### What's New?
|
10 |
|
11 |
#### π **Web & API**
|
12 |
+
- **Built-in Web UI** β No frontend setup needed, just open your browser and start transcribing.
|
13 |
+
- **FastAPI WebSocket Server** β Real-time STT processing with async FFmpeg streaming.
|
14 |
+
- **JavaScript Client** β A ready-to-use MediaRecorder implementation that can be copied on your client side.
|
15 |
|
16 |
#### βοΈ **Core Improvements**
|
17 |
+
- **Buffering Preview** β Displays unvalidated transcription segments for better feedback.
|
18 |
+
- **Multi-User Support** β Handle multiple users simultaneously without conflicts.
|
19 |
+
- **MLX Whisper Backend** β Optimized for Apple Silicon for faster local processing.
|
20 |
+
- **Enhanced Sentence Segmentation** β Better buffer trimming for better accuracy across languages.
|
21 |
+
- **Extended Logging** β More detailed logs to improve debugging and monitoring.
|
|
|
|
|
|
|
22 |
|
23 |
+
#### ποΈ **Advanced Features**
|
24 |
+
- **Real-Time Diarization** β Recognize different speakers in real time using [Diart](https://github.com/juanmc2005/diart).
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
## Installation
|
28 |
|