nvidia
/

parakeet-tdt_ctc-0.6b-ja

@@ -59,7 +59,7 @@ model-index:
       args:
         language: ja
     metrics:
-    - name: Test CER
       type: cer
       value: 10.1
   - task:
@@ -156,7 +156,9 @@ This model provides transcribed speech as a string for a given audio sample.
 ## Model Architecture
-This model uses a Hybrid FastConformer-TDT-CTC architecture. FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
 TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this ja-parakeet-tdt_ctc-0.6b model), thus brings significant inference speed-up. The detail of TDT can be found here: [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795).
@@ -175,6 +177,7 @@ The model was trained on ReazonSpeech v2.0 [5] speech corpus containing more tha
 ## Performance
 The following table summarizes the performance of this model in terms of Character Error Rate (CER%).
 In CER calculation, punctuation marks and non-alphabet characters are removed, and number are transformed to words using `num2words` library [6].
 |**Version**|**Decoder**|**JSUT basic5000**|**MCV 8.0 test**|**MCV 16.1 dev**|**MCV16.1 test**|**TEDxJP-10k**|
@@ -184,7 +187,6 @@ In CER calculation, punctuation marks and non-alphabet characters are removed, a
 These are greedy CER numbers without external LM.
 ## NVIDIA Riva: Deployment
 [NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.

       args:
         language: ja
     metrics:
+    - name: Dev CER
       type: cer
       value: 10.1
   - task:
 ## Model Architecture
+This model uses a Hybrid FastConformer-TDT-CTC architecture.
+FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
 TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this ja-parakeet-tdt_ctc-0.6b model), thus brings significant inference speed-up. The detail of TDT can be found here: [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795).
 ## Performance
 The following table summarizes the performance of this model in terms of Character Error Rate (CER%).
 In CER calculation, punctuation marks and non-alphabet characters are removed, and number are transformed to words using `num2words` library [6].
 |**Version**|**Decoder**|**JSUT basic5000**|**MCV 8.0 test**|**MCV 16.1 dev**|**MCV16.1 test**|**TEDxJP-10k**|
 These are greedy CER numbers without external LM.
 ## NVIDIA Riva: Deployment
 [NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.