Quentin Fuxa commited on
Commit
35b86bd
Β·
1 Parent(s): d9feb41

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -11
README.md CHANGED
@@ -3,30 +3,25 @@
3
  This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine ✨
4
 
5
  <p align="center">
6
- <img src="web/demo.png" alt="Demo Screenshot" width="600">
7
  </p>
8
 
9
  ### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
10
 
11
  #### βš™οΈ **Core Improvements**
12
- - **Buffering Preview** – Displays unvalidated transcription segments for immediate feedback.
13
- - **Multi-User Support** – Handles multiple users simultaneously without conflicts.
14
  - **MLX Whisper Backend** – Optimized for Apple Silicon for faster local processing.
15
- - **Enhanced Sentence Segmentation** – Improved buffer trimming for better accuracy across languages.
16
  - **Confidence validation** – Immediately validate high-confidence tokens for faster inference
17
 
18
  #### πŸŽ™οΈ **Speaker Identification**
19
- - **Real-Time Diarization** – Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart).
20
 
21
  #### 🌐 **Web & API**
22
- - **Built-in Web UI** – Simple browser interface with no frontend setup required
23
  - **FastAPI WebSocket Server** – Real-time speech-to-text processing with async FFmpeg streaming.
24
  - **JavaScript Client** – Ready-to-use MediaRecorder implementation for seamless client-side integration.
25
 
26
- #### πŸš€ **Coming Soon**
27
-
28
- - **Enhanced Diarization Performance** – Optimize speaker identification by implementing longer steps for Diart processing and leveraging language-specific segmentation patterns to improve speaker boundary detection
29
-
30
 
31
  ## Installation
32
 
@@ -86,6 +81,8 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
86
  python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000
87
  ```
88
 
 
 
89
  All [Whisper Streaming](https://github.com/ufal/whisper_streaming) parameters are supported.
90
  Additional parameters:
91
  - `--host` and `--port` let you specify the server’s IP/port.
@@ -94,7 +91,7 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
94
  - `--diarization`: Enable/disable speaker diarization (default: False)
95
  - `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
96
 
97
- 4. **Open the Provided HTML**:
98
 
99
  - By default, the server root endpoint `/` serves a simple `live_transcription.html` page.
100
  - Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).
 
3
  This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine ✨
4
 
5
  <p align="center">
6
+ <img src="web/demo.png" alt="Demo Screenshot" width="730">
7
  </p>
8
 
9
  ### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
10
 
11
  #### βš™οΈ **Core Improvements**
12
+ - **Buffering Preview** – Displays unvalidated transcription segments
13
+ - **Multi-User Support** – Handles multiple users simultaneously by decoupling backend and online asr
14
  - **MLX Whisper Backend** – Optimized for Apple Silicon for faster local processing.
 
15
  - **Confidence validation** – Immediately validate high-confidence tokens for faster inference
16
 
17
  #### πŸŽ™οΈ **Speaker Identification**
18
+ - **Real-Time Diarization** – Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart)
19
 
20
  #### 🌐 **Web & API**
21
+ - **Built-in Web UI** – Simple raw html browser interface with no frontend setup required
22
  - **FastAPI WebSocket Server** – Real-time speech-to-text processing with async FFmpeg streaming.
23
  - **JavaScript Client** – Ready-to-use MediaRecorder implementation for seamless client-side integration.
24
 
 
 
 
 
25
 
26
  ## Installation
27
 
 
81
  python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000
82
  ```
83
 
84
+ **Parameters**
85
+
86
  All [Whisper Streaming](https://github.com/ufal/whisper_streaming) parameters are supported.
87
  Additional parameters:
88
  - `--host` and `--port` let you specify the server’s IP/port.
 
91
  - `--diarization`: Enable/disable speaker diarization (default: False)
92
  - `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
93
 
94
+ 5. **Open the Provided HTML**:
95
 
96
  - By default, the server root endpoint `/` serves a simple `live_transcription.html` page.
97
  - Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).