Commit 
							
							·
						
						3259f6e
	
1
								Parent(s):
							
							675a69a
								
update README with inference
Browse files
    	
        README.md
    CHANGED
    
    | @@ -22,7 +22,47 @@ metrics: | |
| 22 |  | 
| 23 | 
             
            # EfficientTDNN
         | 
| 24 |  | 
| 25 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 26 |  | 
| 27 | 
             
            - **Dynamic Kernel**: The model enables various kernel sizes in {1,3,5}, `kernel/kernel.torchparams`.
         | 
| 28 | 
             
            - **Dynamic Depth**: The model enables additional various depth in {2,3,4} based on **Dynamic Kernel** version, `depth/depth.torchparams`.
         | 
| @@ -59,10 +99,10 @@ Furthermore, some subnets are given in the form of the weights of batchnorm corr | |
| 59 |  | 
| 60 | 
             
            The tag is described as follows.
         | 
| 61 |  | 
| 62 | 
            -
            - max:  | 
| 63 | 
            -
            - Kmin:  | 
| 64 | 
            -
            - Dmin:  | 
| 65 | 
            -
            - C1min:  | 
| 66 | 
            -
            - C2min:  | 
| 67 |  | 
| 68 | 
             
            More details about EfficentTDNN can be found in the paper [EfficientTDNN](https://arxiv.org/abs/2103.13581).
         | 
|  | |
| 22 |  | 
| 23 | 
             
            # EfficientTDNN
         | 
| 24 |  | 
| 25 | 
            +
            This repository provides all the necessary tools to perform speaker verification with a NAS alternative, named as EfficientTDNN.
         | 
| 26 | 
            +
            The system can be used to extract speaker embeddings with different model size.
         | 
| 27 | 
            +
            It is trained on Voxceleb2 training data using data augmentation.
         | 
| 28 | 
            +
            The model performance on Voxceleb1-test set(Cleaned)/Vox1-O are reported as follows.
         | 
| 29 | 
            +
             | 
| 30 | 
            +
            | Supernet Stage | Subnet | MACs (3-second) | Params | EER(%) w/ AS-Norm | EER(%) w/o AS-Norm | minDCF w/ AS-Norm | minDCF w/o AS-Norm |
         | 
| 31 | 
            +
            |:-------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|
         | 
| 32 | 
            +
            | depth | Base | 1.45G | 5.79M | 0.94 | 1.14 | 0.089 | 0.106 |
         | 
| 33 | 
            +
            | width 1 | Mobile | 570.98M | 2.42M | 1.41 | 1.61 | 0.124 | 0.152 |
         | 
| 34 | 
            +
            | width 2 | Small | 204.07M | 899.20K | 2.20 | 2.33 | 0.219 | 0.241 |
         | 
| 35 | 
            +
             | 
| 36 | 
            +
            The details of three subnets are:
         | 
| 37 | 
            +
             | 
| 38 | 
            +
            - Base: (3, [512, 512, 512, 512], [5, 3, 3, 3], 1536)
         | 
| 39 | 
            +
            - Mobile: (3, [384, 256, 256, 256], [5, 3, 3, 3], 768)
         | 
| 40 | 
            +
            - Small: (2, [256, 256, 256], [3, 3, 3], 400)
         | 
| 41 | 
            +
             | 
| 42 | 
            +
            ## Compute your speaker embeddings
         | 
| 43 | 
            +
             | 
| 44 | 
            +
            ```python
         | 
| 45 | 
            +
            import torchaudio
         | 
| 46 | 
            +
            from sugar.models import WrappedModel
         | 
| 47 | 
            +
            wav_file = f"{vox1_root}/id10270/x6uYqmx31kE/00001.wav"
         | 
| 48 | 
            +
            signal, fs =torchaudio.load(wav_file)
         | 
| 49 | 
            +
             | 
| 50 | 
            +
            repo_id = "mechanicalsea/efficient-tdnn"
         | 
| 51 | 
            +
            supernet_filename = "depth/depth.torchparams"
         | 
| 52 | 
            +
            subnet_filename = "depth/depth.ecapa-tdnn.3.512.512.512.512.5.3.3.3.1536.bn.tar"
         | 
| 53 | 
            +
            subnet, info = WrappedModel.from_pretrained(
         | 
| 54 | 
            +
                repo_id=repo_id, supernet_filename=supernet_filename, subnet_filename=subnet_filename)
         | 
| 55 | 
            +
             | 
| 56 | 
            +
            embedding = subnet(signal)
         | 
| 57 | 
            +
            ```
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            ## Inference on GPU
         | 
| 60 | 
            +
             | 
| 61 | 
            +
            To perform inference on the GPU, add  `subnet = subnet.to(device)`  after calling the `from_pretrained` method.
         | 
| 62 | 
            +
             | 
| 63 | 
            +
            ## Model Description
         | 
| 64 | 
            +
             | 
| 65 | 
            +
            Models are listed as follows.
         | 
| 66 |  | 
| 67 | 
             
            - **Dynamic Kernel**: The model enables various kernel sizes in {1,3,5}, `kernel/kernel.torchparams`.
         | 
| 68 | 
             
            - **Dynamic Depth**: The model enables additional various depth in {2,3,4} based on **Dynamic Kernel** version, `depth/depth.torchparams`.
         | 
|  | |
| 99 |  | 
| 100 | 
             
            The tag is described as follows.
         | 
| 101 |  | 
| 102 | 
            +
            - max: (4, [512, 512, 512, 512, 512], [5, 5, 5, 5, 5], 1536)
         | 
| 103 | 
            +
            - Kmin: (4, [512, 512, 512, 512, 512], [1, 1, 1, 1, 1], 1536)
         | 
| 104 | 
            +
            - Dmin: (2, [512, 512, 512], [1, 1, 1], 1536)
         | 
| 105 | 
            +
            - C1min: (2, [256, 256, 256], [1, 1, 1], 768)
         | 
| 106 | 
            +
            - C2min: (2, [128, 128, 128], [1, 1, 1], 384)
         | 
| 107 |  | 
| 108 | 
             
            More details about EfficentTDNN can be found in the paper [EfficientTDNN](https://arxiv.org/abs/2103.13581).
         |