imedennikov commited on
Commit
90bcbde
·
verified ·
1 Parent(s): e74901b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -59,7 +59,7 @@ model-index:
59
  args:
60
  language: ja
61
  metrics:
62
- - name: Test CER
63
  type: cer
64
  value: 10.1
65
  - task:
@@ -156,7 +156,9 @@ This model provides transcribed speech as a string for a given audio sample.
156
 
157
  ## Model Architecture
158
 
159
- This model uses a Hybrid FastConformer-TDT-CTC architecture. FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
 
 
160
 
161
  TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this ja-parakeet-tdt_ctc-0.6b model), thus brings significant inference speed-up. The detail of TDT can be found here: [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795).
162
 
@@ -175,6 +177,7 @@ The model was trained on ReazonSpeech v2.0 [5] speech corpus containing more tha
175
  ## Performance
176
 
177
  The following table summarizes the performance of this model in terms of Character Error Rate (CER%).
 
178
  In CER calculation, punctuation marks and non-alphabet characters are removed, and number are transformed to words using `num2words` library [6].
179
 
180
  |**Version**|**Decoder**|**JSUT basic5000**|**MCV 8.0 test**|**MCV 16.1 dev**|**MCV16.1 test**|**TEDxJP-10k**|
@@ -184,7 +187,6 @@ In CER calculation, punctuation marks and non-alphabet characters are removed, a
184
 
185
  These are greedy CER numbers without external LM.
186
 
187
-
188
  ## NVIDIA Riva: Deployment
189
 
190
  [NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
 
59
  args:
60
  language: ja
61
  metrics:
62
+ - name: Dev CER
63
  type: cer
64
  value: 10.1
65
  - task:
 
156
 
157
  ## Model Architecture
158
 
159
+ This model uses a Hybrid FastConformer-TDT-CTC architecture.
160
+
161
+ FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
162
 
163
  TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this ja-parakeet-tdt_ctc-0.6b model), thus brings significant inference speed-up. The detail of TDT can be found here: [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795).
164
 
 
177
  ## Performance
178
 
179
  The following table summarizes the performance of this model in terms of Character Error Rate (CER%).
180
+
181
  In CER calculation, punctuation marks and non-alphabet characters are removed, and number are transformed to words using `num2words` library [6].
182
 
183
  |**Version**|**Decoder**|**JSUT basic5000**|**MCV 8.0 test**|**MCV 16.1 dev**|**MCV16.1 test**|**TEDxJP-10k**|
 
187
 
188
  These are greedy CER numbers without external LM.
189
 
 
190
  ## NVIDIA Riva: Deployment
191
 
192
  [NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.