📝 About

The Open Voice Cloning Leaderboard is part of the ClonEval benchmark. In addition to the Leaderboard, the benchmark consists of:

a deterministic evaluation protocol that sets defaults for data, metrics, and models to be used in the voice cloning assessment process,
an open-source software library that can be used to evaluate voice cloning models in a reproducible manner.

Evaluation Procedure

The evaluation procedure involves two stages. First, samples are generated using a voice cloning model. The model must take as input a sample of voice to be cloned and a text sample of an utterance.

Following the generation of samples through the voice cloning model, an evaluation is conducted by obtaining speaker embeddings with the WavLM model. For each pair of samples (reference and generated), the cosine similarity between their speaker embeddings from WavLM and between the values of acoustic features extracted from samples is calculated. The similarity values obtained on all samples from a given dataset are averaged to obtain the final evaluation result.

For the purpose of conducting fine-grained error analysis, we also extract acoustic features from each sample with Librosa.

Software Library

The code for the evaluation procedure is available in the GitHub repository (here).