| # AudioCraft objective metrics | |
| In addition to training losses, AudioCraft provides a set of objective metrics | |
| for audio synthesis and audio generation. As these metrics may require | |
| extra dependencies and can be costly to train, they are often disabled by default. | |
| This section provides guidance for setting up and using these metrics in | |
| the AudioCraft training pipelines. | |
| ## Available metrics | |
| ### Audio synthesis quality metrics | |
| #### SI-SNR | |
| We provide an implementation of the Scale-Invariant Signal-to-Noise Ratio in PyTorch. | |
| No specific requirement is needed for this metric. Please activate the metric at the | |
| evaluation stage with the appropriate flag: | |
| **Warning:** We report the opposite of the SI-SNR, e.g. multiplied by -1. This is due to internal | |
| details where the SI-SNR score can also be used as a training loss function, where lower | |
| values should indicate better reconstruction. Negative values are such expected and a good sign! Those should be again multiplied by `-1` before publication :) | |
| ```shell | |
| dora run <...> evaluate.metrics.sisnr=true | |
| ``` | |
| #### ViSQOL | |
| We provide a Python wrapper around the ViSQOL [official implementation](https://github.com/google/visqol) | |
| to conveniently run ViSQOL within the training pipelines. | |
| One must specify the path to the ViSQOL installation through the configuration in order | |
| to enable ViSQOL computations in AudioCraft: | |
| ```shell | |
| # the first parameter is used to activate visqol computation while the second specify | |
| # the path to visqol's library to be used by our python wrapper | |
| dora run <...> evaluate.metrics.visqol=true metrics.visqol.bin=<path_to_visqol> | |
| ``` | |
| See an example grid: [Compression with ViSQOL](../audiocraft/grids/compression/encodec_musicgen_32khz.py) | |
| To learn more about ViSQOL and how to build ViSQOL binary using bazel, please refer to the | |
| instructions available in the [open source repository](https://github.com/google/visqol). | |
| ### Audio generation metrics | |
| #### Frechet Audio Distance | |
| Similarly to ViSQOL, we use a Python wrapper around the Frechet Audio Distance | |
| [official implementation](https://github.com/google-research/google-research/tree/master/frechet_audio_distance) | |
| in TensorFlow. | |
| Note that we had to make several changes to the actual code in order to make it work. | |
| Please refer to the [FrechetAudioDistanceMetric](../audiocraft/metrics/fad.py) class documentation | |
| for more details. We do not plan to provide further support in obtaining a working setup for the | |
| Frechet Audio Distance at this stage. | |
| ```shell | |
| # the first parameter is used to activate FAD metric computation while the second specify | |
| # the path to FAD library to be used by our python wrapper | |
| dora run <...> evaluate.metrics.fad=true metrics.fad.bin=<path_to_google_research_repository> | |
| ``` | |
| See an example grid: [Evaluation with FAD](../audiocraft/grids/musicgen/musicgen_pretrained_32khz_eval.py) | |
| #### Kullback-Leibler Divergence | |
| We provide a PyTorch implementation of the Kullback-Leibler Divergence computed over the probabilities | |
| of the labels obtained by a state-of-the-art audio classifier. We provide our implementation of the KLD | |
| using the [PaSST classifier](https://github.com/kkoutini/PaSST). | |
| In order to use the KLD metric over PaSST, you must install the PaSST library as an extra dependency: | |
| ```shell | |
| pip install 'git+https://github.com/kkoutini/[email protected]#egg=hear21passt' | |
| ``` | |
| Then similarly, you can use the metric activating the corresponding flag: | |
| ```shell | |
| # one could extend the kld metric with additional audio classifier models that can then be picked through the configuration | |
| dora run <...> evaluate.metrics.kld=true metrics.kld.model=passt | |
| ``` | |
| #### Text consistency | |
| We provide a text-consistency metric, similarly to the MuLan Cycle Consistency from | |
| [MusicLM](https://arxiv.org/pdf/2301.11325.pdf) or the CLAP score used in | |
| [Make-An-Audio](https://arxiv.org/pdf/2301.12661v1.pdf). | |
| More specifically, we provide a PyTorch implementation of a Text consistency metric | |
| relying on a pre-trained [Contrastive Language-Audio Pretraining (CLAP)](https://github.com/LAION-AI/CLAP). | |
| Please install the CLAP library as an extra dependency prior to using the metric: | |
| ```shell | |
| pip install laion_clap | |
| ``` | |
| Then similarly, you can use the metric activating the corresponding flag: | |
| ```shell | |
| # one could extend the text consistency metric with additional audio classifier models that can then be picked through the configuration | |
| dora run ... evaluate.metrics.text_consistency=true metrics.text_consistency.model=clap | |
| ``` | |
| Note that the text consistency metric based on CLAP will require the CLAP checkpoint to be | |
| provided in the configuration. | |
| #### Chroma cosine similarity | |
| Finally, as introduced in MusicGen, we provide a Chroma Cosine Similarity metric in PyTorch. | |
| No specific requirement is needed for this metric. Please activate the metric at the | |
| evaluation stage with the appropriate flag: | |
| ```shell | |
| dora run ... evaluate.metrics.chroma_cosine=true | |
| ``` | |
| #### Comparing against reconstructed audio | |
| For all the above audio generation metrics, we offer the option to compute the metric on the reconstructed audio | |
| fed in EnCodec instead of the generated sample using the flag `<metric>.use_gt=true`. | |
| ## Example usage | |
| You will find example of configuration for the different metrics introduced above in: | |
| * The [musicgen's default solver](../config/solver/musicgen/default.yaml) for all audio generation metrics | |
| * The [compression's default solver](../config/solver/compression/default.yaml) for all audio synthesis metrics | |
| Similarly, we provide different examples in our grids: | |
| * [Evaluation with ViSQOL](../audiocraft/grids/compression/encodec_musicgen_32khz.py) | |
| * [Evaluation with FAD and others](../audiocraft/grids/musicgen/musicgen_pretrained_32khz_eval.py) | |