Spaces:

langtech-innovation
/

WhisperLiveKitDiarization

Paused

Dominik Macháček commited on Nov 28, 2023

Commit

bd0d848

2 Parent(s): 18c1434 878f11c

Merge branch 'main' into TIAGo-WE-COBOT

Files changed (1) hide show

README.md CHANGED Viewed

@@ -42,7 +42,7 @@ The unused one does not have to be installed. We integrate the following segment
 ## Usage
-### Realtime simulation from audio file
 ```
 usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
@@ -126,14 +126,14 @@ from whisper_online import *
 src_lan = "en"  # source language
 tgt_lan = "en"  # target language  -- same as source for ASR, "en" if translate task is used
 asr = FasterWhisperASR(lan, "large-v2")  # loads and wraps Whisper model
 # set options:
 # asr.set_translate_task()  # it will translate from lan into English
-# asr.use_vad()  # set using VAD
-online = OnlineASRProcessor(tgt_lan, asr)  # create processing object
 while audio_has_not_ended:   # processing loop:
@@ -149,7 +149,7 @@ print(o)  # do something with the last output
 online.init()  # refresh if you're going to re-use the object for the next audio
 ```
-### Server
 `whisper_online_server.py` has the same model options as `whisper_online.py`, plus `--host` and `--port` of the TCP connection. See help message (`-h` option).

 ## Usage
+### Real-time simulation from audio file
 ```
 usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
 src_lan = "en"  # source language
 tgt_lan = "en"  # target language  -- same as source for ASR, "en" if translate task is used
 asr = FasterWhisperASR(lan, "large-v2")  # loads and wraps Whisper model
 # set options:
 # asr.set_translate_task()  # it will translate from lan into English
+# asr.use_vad()  # set using VAD
+tokenizer = create_tokenizer(tgt_lan)  # sentence segmenter for the target language
+online = OnlineASRProcessor(asr, tokenizer)  # create processing object
 while audio_has_not_ended:   # processing loop:
 online.init()  # refresh if you're going to re-use the object for the next audio
 ```
+### Server -- real-time from mic
 `whisper_online_server.py` has the same model options as `whisper_online.py`, plus `--host` and `--port` of the TCP connection. See help message (`-h` option).