Spaces:
Build error
Build error
Commit
Β·
af37d8b
1
Parent(s):
8235b4f
updated README.md
Browse files
README.md
CHANGED
|
@@ -1,123 +1,10 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
pip install -r requirements.txt
|
| 12 |
-
```
|
| 13 |
-
|
| 14 |
-
| Pretrained-Model | Dataset | Epochs | Train Loss | Valid Loss |
|
| 15 |
-
|:------------:|:------------:|:------------:|:------------:|:------------:
|
| 16 |
-
| [checkpoint.th](https://drive.google.com/drive/folders/1WzhvH1oIB9LqoTyItA6jViTRai5aURzJ?usp=sharing) | Librimix-7 (16k-mix_clean) | 31 | 0.04 | 0.64 |
|
| 17 |
-
|
| 18 |
-
This is an intermediate checkpoint just for demo purpose.
|
| 19 |
-
|
| 20 |
-
create directory ```outputs/exp_``` and save checkpoint there
|
| 21 |
-
```
|
| 22 |
-
svoice_demo
|
| 23 |
-
βββ outputs
|
| 24 |
-
β βββ exp_
|
| 25 |
-
β βββ checkpoint.th
|
| 26 |
-
...
|
| 27 |
-
```
|
| 28 |
-
|
| 29 |
-
## Running End To End project
|
| 30 |
-
#### Terminal 1
|
| 31 |
-
```bash
|
| 32 |
-
conda activate svoice
|
| 33 |
-
python demo.py
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
## Training
|
| 37 |
-
Create dataset ```mix_clean``` with sample rate ```16K``` using [librimix](https://github.com/shakeddovrat/librimix) repo.
|
| 38 |
-
|
| 39 |
-
Dataset Structure
|
| 40 |
-
```
|
| 41 |
-
svoice_demo
|
| 42 |
-
βββ Libri7Mix_Dataset
|
| 43 |
-
β βββ wav16k
|
| 44 |
-
β βββ min
|
| 45 |
-
β β βββ dev
|
| 46 |
-
β β βββ ...
|
| 47 |
-
β β βββ test
|
| 48 |
-
β β βββ ...
|
| 49 |
-
β β βββ train-360
|
| 50 |
-
β β βββ ...
|
| 51 |
-
...
|
| 52 |
-
```
|
| 53 |
-
|
| 54 |
-
#### Create ```metadata``` files
|
| 55 |
-
For Librimix7 dataset
|
| 56 |
-
```
|
| 57 |
-
bash create_metadata_librimix7.sh
|
| 58 |
-
```
|
| 59 |
-
|
| 60 |
-
For Librimix10 dataset
|
| 61 |
-
```
|
| 62 |
-
bash create_metadata_librimix10.sh
|
| 63 |
-
```
|
| 64 |
-
|
| 65 |
-
Change ```conf/config.yaml``` according to your settings. Set ```C: 10``` value at line 66 for number of speakers.
|
| 66 |
-
|
| 67 |
-
```
|
| 68 |
-
python train.py
|
| 69 |
-
```
|
| 70 |
-
This will automaticlly read all the configurations from the `conf/config.yaml` file.
|
| 71 |
-
To know more about the training you may refer to original [svoice](https://github.com/facebookresearch/svoice) repo.
|
| 72 |
-
|
| 73 |
-
#### Distributed Training
|
| 74 |
-
|
| 75 |
-
```
|
| 76 |
-
python train.py ddp=1
|
| 77 |
-
```
|
| 78 |
-
|
| 79 |
-
### Evaluating
|
| 80 |
-
|
| 81 |
-
```
|
| 82 |
-
python -m svoice.evaluate <path to the model> <path to folder containing mix.json and all target separated channels json files s<ID>.json>
|
| 83 |
-
```
|
| 84 |
-
|
| 85 |
-
### Citation
|
| 86 |
-
|
| 87 |
-
The svoice code is borrowed from original [svoice](https://github.com/facebookresearch/svoice) repository. All rights of code are reserved by [META Research](https://github.com/facebookresearch).
|
| 88 |
-
|
| 89 |
-
```
|
| 90 |
-
@inproceedings{nachmani2020voice,
|
| 91 |
-
title={Voice Separation with an Unknown Number of Multiple Speakers},
|
| 92 |
-
author={Nachmani, Eliya and Adi, Yossi and Wolf, Lior},
|
| 93 |
-
booktitle={Proceedings of the 37th international conference on Machine learning},
|
| 94 |
-
year={2020}
|
| 95 |
-
}
|
| 96 |
-
```
|
| 97 |
-
```
|
| 98 |
-
@misc{cosentino2020librimix,
|
| 99 |
-
title={LibriMix: An Open-Source Dataset for Generalizable Speech Separation},
|
| 100 |
-
author={Joris Cosentino and Manuel Pariente and Samuele Cornell and Antoine Deleforge and Emmanuel Vincent},
|
| 101 |
-
year={2020},
|
| 102 |
-
eprint={2005.11262},
|
| 103 |
-
archivePrefix={arXiv},
|
| 104 |
-
primaryClass={eess.AS}
|
| 105 |
-
}
|
| 106 |
-
```
|
| 107 |
-
## License
|
| 108 |
-
This repository is released under the CC-BY-NC-SA 4.0. license as found in the [LICENSE](LICENSE) file.
|
| 109 |
-
|
| 110 |
-
The file: `svoice/models/sisnr_loss.py` and `svoice/data/preprocess.py` were adapted from the [kaituoxu/Conv-TasNet][convtas] repository. It is an unofficial implementation of the [Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation][convtas-paper] paper, released under the MIT License.
|
| 111 |
-
Additionally, several input manipulation functions were borrowed and modified from the [yluo42/TAC][tac] repository, released under the CC BY-NC-SA 3.0 License.
|
| 112 |
-
|
| 113 |
-
[icml]: https://arxiv.org/abs/2003.01531.pdf
|
| 114 |
-
[icassp]: https://arxiv.org/pdf/2011.02329.pdf
|
| 115 |
-
[web]: https://enk100.github.io/speaker_separation/
|
| 116 |
-
[pytorch]: https://pytorch.org/
|
| 117 |
-
[hydra]: https://github.com/facebookresearch/hydra
|
| 118 |
-
[hydra-web]: https://hydra.cc/
|
| 119 |
-
[convtas]: https://github.com/kaituoxu/Conv-TasNet
|
| 120 |
-
[convtas-paper]: https://arxiv.org/pdf/1809.07454.pdf
|
| 121 |
-
[tac]: https://github.com/yluo42/TAC
|
| 122 |
-
[nprirgen]: https://github.com/ty274/rir-generator
|
| 123 |
-
[rir]:https://asa.scitation.org/doi/10.1121/1.382599
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: svoice_demo
|
| 3 |
+
emoji: π₯
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 3.11.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|