Dominik Macháček commited on
Commit
260b1f8
·
1 Parent(s): e62fba3

options readme update, and catch exception in server

Browse files
Files changed (2) hide show
  1. README.md +18 -5
  2. whisper_online_server.py +6 -3
README.md CHANGED
@@ -12,7 +12,7 @@ pip install opus-fast-mosestokenizer
12
 
13
  The most recommended backend is [faster-whisper](https://github.com/guillaumekln/faster-whisper) with GPU support. Follow their instructions for NVIDIA libraries -- we succeeded with CUDNN 8.5.0 and CUDA 11.7. Install with `pip install faster-whisper`.
14
 
15
- Alternative, less restrictive, but slowe backend is [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped): `pip install git+https://github.com/linto-ai/whisper-timestamped`
16
 
17
  The backend is loaded only when chosen. The unused one does not have to be installed.
18
 
@@ -22,7 +22,7 @@ The backend is loaded only when chosen. The unused one does not have to be insta
22
 
23
  ```
24
  usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
25
- [--start_at START_AT] [--backend {faster-whisper,whisper_timestamped}] [--offline] [--vad]
26
  audio_path
27
 
28
  positional arguments:
@@ -46,7 +46,8 @@ options:
46
  --backend {faster-whisper,whisper_timestamped}
47
  Load only this backend for Whisper processing.
48
  --offline Offline mode.
49
- --vad Use VAD = voice activity detection, with the default parameters.
 
50
  ```
51
 
52
  Example:
@@ -57,6 +58,18 @@ It simulates realtime processing from a pre-recorded mono 16k wav file.
57
  python3 whisper_online.py en-demo16.wav --language en --min-chunk-size 1 > out.txt
58
  ```
59
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ### Output format
61
 
62
  ```
@@ -114,7 +127,7 @@ online.init() # refresh if you're going to re-use the object for the next audio
114
 
115
  ### Server
116
 
117
- `whisper_online_server.py` has the same model options as `whisper_online.py`, plus `--host` and `--port` of the TCP connection.
118
 
119
  Client example:
120
 
@@ -122,7 +135,7 @@ Client example:
122
  arecord -f S16_LE -c1 -r 16000 -t raw -D default | nc localhost 43001
123
  ```
124
 
125
- - arecord sends realtime audio from a sound device, in raw audio format -- 16000 sampling rate, mono channel, S16\_LE -- signed 16-bit integer low endian. (use the alternative to arecord that works for you)
126
 
127
  - nc is netcat with server's host and port
128
 
 
12
 
13
  The most recommended backend is [faster-whisper](https://github.com/guillaumekln/faster-whisper) with GPU support. Follow their instructions for NVIDIA libraries -- we succeeded with CUDNN 8.5.0 and CUDA 11.7. Install with `pip install faster-whisper`.
14
 
15
+ Alternative, less restrictive, but slower backend is [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped): `pip install git+https://github.com/linto-ai/whisper-timestamped`
16
 
17
  The backend is loaded only when chosen. The unused one does not have to be installed.
18
 
 
22
 
23
  ```
24
  usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
25
+ [--start_at START_AT] [--backend {faster-whisper,whisper_timestamped}] [--offline] [--comp_unaware] [--vad]
26
  audio_path
27
 
28
  positional arguments:
 
46
  --backend {faster-whisper,whisper_timestamped}
47
  Load only this backend for Whisper processing.
48
  --offline Offline mode.
49
+ --comp_unaware Computationally unaware simulation.
50
+ --vad Use VAD = voice activity detection, with the default parameters.
51
  ```
52
 
53
  Example:
 
58
  python3 whisper_online.py en-demo16.wav --language en --min-chunk-size 1 > out.txt
59
  ```
60
 
61
+ Simulation modes:
62
+
63
+ - default mode, no special option: real-time simulation from file, computationally aware. The chunk size is `MIN_CHUNK_SIZE` or larger, if more audio arrived during last update computation.
64
+
65
+ - `--comp_unaware` option: computationally unaware simulation. It means that the timer that counts the emission times "stops" when the model is computing. The chunk size is always `MIN_CHUNK_SIZE`. The latency is caused only by the model being unable to confirm the output, e.g. because of language ambiguity etc., and not because of slow hardware or suboptimal implementation. We implement this feature for finding the lower bound for latency.
66
+
67
+ - `--start_at START_AT`: Start processing audio at this time. The first update receives the whole audio by `START_AT`. It is useful for debugging, e.g. when we observe a bug in a specific time in audio file, and want to reproduce it quickly, without long waiting.
68
+
69
+ - `--ofline` option: It processes the whole audio file at once, in offline mode. We implement it to find out the lowest possible WER on given audio file.
70
+
71
+
72
+
73
  ### Output format
74
 
75
  ```
 
127
 
128
  ### Server
129
 
130
+ `whisper_online_server.py` has the same model options as `whisper_online.py`, plus `--host` and `--port` of the TCP connection. See help message (`-h` option).
131
 
132
  Client example:
133
 
 
135
  arecord -f S16_LE -c1 -r 16000 -t raw -D default | nc localhost 43001
136
  ```
137
 
138
+ - arecord sends realtime audio from a sound device (e.g. mic), in raw audio format -- 16000 sampling rate, mono channel, S16\_LE -- signed 16-bit integer low endian. (use the alternative to arecord that works for you)
139
 
140
  - nc is netcat with server's host and port
141
 
whisper_online_server.py CHANGED
@@ -20,9 +20,7 @@ parser.add_argument('--model_cache_dir', type=str, default=None, help="Overridin
20
  parser.add_argument('--model_dir', type=str, default=None, help="Dir where Whisper model.bin and other files are saved. This option overrides --model and --model_cache_dir parameter.")
21
  parser.add_argument('--lan', '--language', type=str, default='en', help="Language code for transcription, e.g. en,de,cs.")
22
  parser.add_argument('--task', type=str, default='transcribe', choices=["transcribe","translate"],help="Transcribe or translate.")
23
- parser.add_argument('--start_at', type=float, default=0.0, help='Start processing audio at this time.')
24
  parser.add_argument('--backend', type=str, default="faster-whisper", choices=["faster-whisper", "whisper_timestamped"],help='Load only this backend for Whisper processing.')
25
- parser.add_argument('--offline', action="store_true", default=False, help='Offline mode.')
26
  parser.add_argument('--vad', action="store_true", default=False, help='Use VAD = voice activity detection, with the default parameters.')
27
  args = parser.parse_args()
28
 
@@ -183,7 +181,12 @@ class ServerProcessor:
183
  break
184
  self.online_asr_proc.insert_audio_chunk(a)
185
  o = online.process_iter()
186
- self.send_result(o)
 
 
 
 
 
187
  # o = online.finish() # this should be working
188
  # self.send_result(o)
189
 
 
20
  parser.add_argument('--model_dir', type=str, default=None, help="Dir where Whisper model.bin and other files are saved. This option overrides --model and --model_cache_dir parameter.")
21
  parser.add_argument('--lan', '--language', type=str, default='en', help="Language code for transcription, e.g. en,de,cs.")
22
  parser.add_argument('--task', type=str, default='transcribe', choices=["transcribe","translate"],help="Transcribe or translate.")
 
23
  parser.add_argument('--backend', type=str, default="faster-whisper", choices=["faster-whisper", "whisper_timestamped"],help='Load only this backend for Whisper processing.')
 
24
  parser.add_argument('--vad', action="store_true", default=False, help='Use VAD = voice activity detection, with the default parameters.')
25
  args = parser.parse_args()
26
 
 
181
  break
182
  self.online_asr_proc.insert_audio_chunk(a)
183
  o = online.process_iter()
184
+ try:
185
+ self.send_result(o)
186
+ except BrokenPipeError:
187
+ print("broken pipe -- connection closed?",file=sys.stderr)
188
+ break
189
+
190
  # o = online.finish() # this should be working
191
  # self.send_result(o)
192