Quentin Fuxa
commited on
Commit
Β·
35b86bd
1
Parent(s):
d9feb41
Update README.md
Browse files
README.md
CHANGED
@@ -3,30 +3,25 @@
|
|
3 |
This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine β¨
|
4 |
|
5 |
<p align="center">
|
6 |
-
<img src="web/demo.png" alt="Demo Screenshot" width="
|
7 |
</p>
|
8 |
|
9 |
### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
|
10 |
|
11 |
#### βοΈ **Core Improvements**
|
12 |
-
- **Buffering Preview** β Displays unvalidated transcription segments
|
13 |
-
- **Multi-User Support** β Handles multiple users simultaneously
|
14 |
- **MLX Whisper Backend** β Optimized for Apple Silicon for faster local processing.
|
15 |
-
- **Enhanced Sentence Segmentation** β Improved buffer trimming for better accuracy across languages.
|
16 |
- **Confidence validation** β Immediately validate high-confidence tokens for faster inference
|
17 |
|
18 |
#### ποΈ **Speaker Identification**
|
19 |
-
- **Real-Time Diarization** β Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart)
|
20 |
|
21 |
#### π **Web & API**
|
22 |
-
- **Built-in Web UI** β Simple browser interface with no frontend setup required
|
23 |
- **FastAPI WebSocket Server** β Real-time speech-to-text processing with async FFmpeg streaming.
|
24 |
- **JavaScript Client** β Ready-to-use MediaRecorder implementation for seamless client-side integration.
|
25 |
|
26 |
-
#### π **Coming Soon**
|
27 |
-
|
28 |
-
- **Enhanced Diarization Performance** β Optimize speaker identification by implementing longer steps for Diart processing and leveraging language-specific segmentation patterns to improve speaker boundary detection
|
29 |
-
|
30 |
|
31 |
## Installation
|
32 |
|
@@ -86,6 +81,8 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
|
|
86 |
python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000
|
87 |
```
|
88 |
|
|
|
|
|
89 |
All [Whisper Streaming](https://github.com/ufal/whisper_streaming) parameters are supported.
|
90 |
Additional parameters:
|
91 |
- `--host` and `--port` let you specify the serverβs IP/port.
|
@@ -94,7 +91,7 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
|
|
94 |
- `--diarization`: Enable/disable speaker diarization (default: False)
|
95 |
- `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
|
96 |
|
97 |
-
|
98 |
|
99 |
- By default, the server root endpoint `/` serves a simple `live_transcription.html` page.
|
100 |
- Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).
|
|
|
3 |
This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine β¨
|
4 |
|
5 |
<p align="center">
|
6 |
+
<img src="web/demo.png" alt="Demo Screenshot" width="730">
|
7 |
</p>
|
8 |
|
9 |
### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
|
10 |
|
11 |
#### βοΈ **Core Improvements**
|
12 |
+
- **Buffering Preview** β Displays unvalidated transcription segments
|
13 |
+
- **Multi-User Support** β Handles multiple users simultaneously by decoupling backend and online asr
|
14 |
- **MLX Whisper Backend** β Optimized for Apple Silicon for faster local processing.
|
|
|
15 |
- **Confidence validation** β Immediately validate high-confidence tokens for faster inference
|
16 |
|
17 |
#### ποΈ **Speaker Identification**
|
18 |
+
- **Real-Time Diarization** β Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart)
|
19 |
|
20 |
#### π **Web & API**
|
21 |
+
- **Built-in Web UI** β Simple raw html browser interface with no frontend setup required
|
22 |
- **FastAPI WebSocket Server** β Real-time speech-to-text processing with async FFmpeg streaming.
|
23 |
- **JavaScript Client** β Ready-to-use MediaRecorder implementation for seamless client-side integration.
|
24 |
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## Installation
|
27 |
|
|
|
81 |
python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000
|
82 |
```
|
83 |
|
84 |
+
**Parameters**
|
85 |
+
|
86 |
All [Whisper Streaming](https://github.com/ufal/whisper_streaming) parameters are supported.
|
87 |
Additional parameters:
|
88 |
- `--host` and `--port` let you specify the serverβs IP/port.
|
|
|
91 |
- `--diarization`: Enable/disable speaker diarization (default: False)
|
92 |
- `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
|
93 |
|
94 |
+
5. **Open the Provided HTML**:
|
95 |
|
96 |
- By default, the server root endpoint `/` serves a simple `live_transcription.html` page.
|
97 |
- Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).
|