|
# HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis |
|
Based on the script [`train_hifigan.py`](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/hifigan/train_hifigan.py). |
|
|
|
## Training HiFi-GAN from scratch with LJSpeech dataset. |
|
This example code show you how to train MelGAN from scratch with Tensorflow 2 based on custom training loop and tf.function. The data used for this example is LJSpeech, you can download the dataset at [link](https://keithito.com/LJ-Speech-Dataset/). |
|
|
|
### Step 1: Create Tensorflow based Dataloader (tf.dataset) |
|
First, you need define data loader based on AbstractDataset class (see [`abstract_dataset.py`](https://github.com/tensorspeech/TensorFlowTTS/tree/master/tensorflow_tts/datasets/abstract_dataset.py)). On this example, a dataloader read dataset from path. I use suffix to classify what file is a audio and mel-spectrogram (see [`audio_mel_dataset.py`](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/melgan/audio_mel_dataset.py)). If you already have preprocessed version of your target dataset, you don't need to use this example dataloader, you just need refer my dataloader and modify **generator function** to adapt with your case. Normally, a generator function should return [audio, mel]. |
|
|
|
### Step 2: Training from scratch |
|
After you re-define your dataloader, pls modify an input arguments, train_dataset and valid_dataset from [`train_hifigan.py`](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/hifigan/train_hifigan.py). Here is an example command line to training HiFi-GAN from scratch: |
|
|
|
First, you need training generator with only stft loss: |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=0 python examples/hifigan/train_hifigan.py \ |
|
--train-dir ./dump/train/ \ |
|
--dev-dir ./dump/valid/ \ |
|
--outdir ./examples/hifigan/exp/train.hifigan.v1/ \ |
|
--config ./examples/hifigan/conf/hifigan.v1.yaml \ |
|
--use-norm 1 |
|
--generator_mixed_precision 1 \ |
|
--resume "" |
|
``` |
|
|
|
Then resume and start training generator + discriminator: |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=0 python examples/hifigan/train_hifigan.py \ |
|
--train-dir ./dump/train/ \ |
|
--dev-dir ./dump/valid/ \ |
|
--outdir ./examples/hifigan/exp/train.hifigan.v1/ \ |
|
--config ./examples/hifigan/conf/hifigan.v1.yaml \ |
|
--use-norm 1 |
|
--resume ./examples/hifigan/exp/train.hifigan.v1/checkpoints/ckpt-100000 |
|
``` |
|
|
|
IF you want to use MultiGPU to training you can replace `CUDA_VISIBLE_DEVICES=0` by `CUDA_VISIBLE_DEVICES=0,1,2,3` for example. You also need to tune the `batch_size` for each GPU (in config file) by yourself to maximize the performance. Note that MultiGPU now support for Training but not yet support for Decode. |
|
|
|
In case you want to resume the training progress, please following below example command line: |
|
|
|
```bash |
|
--resume ./examples/hifigan/exp/train.hifigan.v1/checkpoints/ckpt-100000 |
|
``` |
|
|
|
If you want to finetune a model, use `--pretrained` like this with the filename of the generator |
|
```bash |
|
--pretrained ptgenerator.h5 |
|
``` |
|
|
|
**IMPORTANT NOTES**: |
|
|
|
- When training generator only, we enable mixed precision to speed-up training progress. |
|
- We don't apply mixed precision when training both generator and discriminator. (Discriminator include group-convolution, which cause discriminator slower when enable mixed precision). |
|
- 100k here is a *discriminator_train_start_steps* parameters from [hifigan.v1.yaml](https://github.com/tensorspeech/TensorflowTTS/tree/master/examples/hifigan/conf/hifigan.v1.yaml) |
|
|
|
|
|
## Reference |
|
|
|
1. https://github.com/descriptinc/melgan-neurips |
|
2. https://github.com/kan-bayashi/ParallelWaveGAN |
|
3. https://github.com/tensorflow/addons |
|
4. [HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis](https://arxiv.org/abs/2010.05646) |
|
5. [MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis](https://arxiv.org/abs/1910.06711) |
|
6. [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480) |