Update README.md
Browse files
README.md
CHANGED
|
@@ -77,17 +77,11 @@ With streaming, the results with different chunk sizes on test-clean are the fol
|
|
| 77 |
|
| 78 |
## Pipeline description
|
| 79 |
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
This ASR system is composed of 3 different but linked blocks:
|
| 83 |
-
- Tokenizer (unigram) that transforms words into subword units and trained with
|
| 84 |
-
the train transcriptions of LibriSpeech.
|
| 85 |
-
- Neural language model (Transformer LM) trained on the full 10M words dataset.
|
| 86 |
-
- Acoustic model made of a conformer encoder and a joint decoder with CTC +
|
| 87 |
-
transformer. Hence, the decoding also incorporates the CTC probabilities.
|
| 88 |
|
| 89 |
The system is trained with recordings sampled at 16kHz (single channel).
|
| 90 |
-
The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling
|
| 91 |
|
| 92 |
## Install SpeechBrain
|
| 93 |
|
|
|
|
| 77 |
|
| 78 |
## Pipeline description
|
| 79 |
|
| 80 |
+
This ASR system is a Conformer model trained with the RNN-T loss (with an auxiliary CTC loss to stabilize training). The model operates with a unigram tokenizer.
|
| 81 |
+
Architecture details are described in the [training hyperparameters file](https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
The system is trained with recordings sampled at 16kHz (single channel).
|
| 84 |
+
The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling `transcribe_file` if needed.
|
| 85 |
|
| 86 |
## Install SpeechBrain
|
| 87 |
|