Update README.md
Browse files
README.md
CHANGED
@@ -59,7 +59,7 @@ model-index:
|
|
59 |
args:
|
60 |
language: ja
|
61 |
metrics:
|
62 |
-
- name:
|
63 |
type: cer
|
64 |
value: 10.1
|
65 |
- task:
|
@@ -156,7 +156,9 @@ This model provides transcribed speech as a string for a given audio sample.
|
|
156 |
|
157 |
## Model Architecture
|
158 |
|
159 |
-
This model uses a Hybrid FastConformer-TDT-CTC architecture.
|
|
|
|
|
160 |
|
161 |
TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this ja-parakeet-tdt_ctc-0.6b model), thus brings significant inference speed-up. The detail of TDT can be found here: [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795).
|
162 |
|
@@ -175,6 +177,7 @@ The model was trained on ReazonSpeech v2.0 [5] speech corpus containing more tha
|
|
175 |
## Performance
|
176 |
|
177 |
The following table summarizes the performance of this model in terms of Character Error Rate (CER%).
|
|
|
178 |
In CER calculation, punctuation marks and non-alphabet characters are removed, and number are transformed to words using `num2words` library [6].
|
179 |
|
180 |
|**Version**|**Decoder**|**JSUT basic5000**|**MCV 8.0 test**|**MCV 16.1 dev**|**MCV16.1 test**|**TEDxJP-10k**|
|
@@ -184,7 +187,6 @@ In CER calculation, punctuation marks and non-alphabet characters are removed, a
|
|
184 |
|
185 |
These are greedy CER numbers without external LM.
|
186 |
|
187 |
-
|
188 |
## NVIDIA Riva: Deployment
|
189 |
|
190 |
[NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
|
|
|
59 |
args:
|
60 |
language: ja
|
61 |
metrics:
|
62 |
+
- name: Dev CER
|
63 |
type: cer
|
64 |
value: 10.1
|
65 |
- task:
|
|
|
156 |
|
157 |
## Model Architecture
|
158 |
|
159 |
+
This model uses a Hybrid FastConformer-TDT-CTC architecture.
|
160 |
+
|
161 |
+
FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
|
162 |
|
163 |
TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this ja-parakeet-tdt_ctc-0.6b model), thus brings significant inference speed-up. The detail of TDT can be found here: [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795).
|
164 |
|
|
|
177 |
## Performance
|
178 |
|
179 |
The following table summarizes the performance of this model in terms of Character Error Rate (CER%).
|
180 |
+
|
181 |
In CER calculation, punctuation marks and non-alphabet characters are removed, and number are transformed to words using `num2words` library [6].
|
182 |
|
183 |
|**Version**|**Decoder**|**JSUT basic5000**|**MCV 8.0 test**|**MCV 16.1 dev**|**MCV16.1 test**|**TEDxJP-10k**|
|
|
|
187 |
|
188 |
These are greedy CER numbers without external LM.
|
189 |
|
|
|
190 |
## NVIDIA Riva: Deployment
|
191 |
|
192 |
[NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
|