Hervé Bredin commited on
Commit
f47dcce
·
1 Parent(s): d80ca5c

feat: initial import

Browse files
Files changed (7) hide show
  1. README.md +110 -0
  2. config.yaml +93 -0
  3. hparams.yaml +15 -0
  4. overrides.yaml +22 -0
  5. pytorch_model.bin +3 -0
  6. tfevents.bin +3 -0
  7. train.log +18 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - pyannote
4
+ - audio
5
+ - voice
6
+ - speech
7
+ - speaker
8
+ - speaker segmentation
9
+ - voice activity detection
10
+ - overlapped speech detection
11
+ - resegmentation
12
+ datasets:
13
+ - ami
14
+ - dihard
15
+ - voxconverse
16
+ license: mit
17
+ inference: false
18
+ ---
19
+
20
+ # Pretrained speaker segmentation model
21
+
22
+ This model relies on `pyannote.audio` 2.0 (which is still in development):
23
+
24
+ ```bash
25
+ $ pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip
26
+ ```
27
+
28
+ ## Basic inference
29
+
30
+ ```python
31
+ >>> from pyannote.audio import Inference
32
+ >>> inference = Inference("pyannote/Segmentation")
33
+ >>> segmentation = inference("audio.wav")
34
+ ```
35
+
36
+ ## Advanced pipelines
37
+
38
+ ### Voice activity detection
39
+
40
+ ```python
41
+ >>> from pyannote.audio.pipelines import VoiceActivityDetection
42
+ >>> HYPER_PARAMETERS = {"onset": 0.5, "offset": 0.5, "min_duration_on": 0.0, "min_duration_off": 0.0}
43
+ >>> pipeline = VoiceActivityDetection(segmentation="pyannote/Segmentation").instantiate(HYPER_PARAMETERS)
44
+ >>> vad = pipeline("audio.wav")
45
+ ```
46
+
47
+ Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
48
+ ----------------|---------|----------|-------------------|-------------------
49
+ AMI Mix-Headset | TODO | TODO | TODO | TODO
50
+ DIHARD3 | TODO | TODO | TODO | TODO
51
+ VoxConverse | TODO | TODO | TODO | TODO
52
+
53
+
54
+ ### Overlapped speech detection
55
+
56
+ ```python
57
+ >>> from pyannote.audio.pipelines import OverlappedSpeechDetection
58
+ >>> pipeline = OverlappedSpeechDetection(segmentation="pyannote/Segmentation").instantiate(HYPER_PARAMETERS)
59
+ >>> osd = pipeline("audio.wav")
60
+ ```
61
+
62
+ Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
63
+ ----------------|---------|----------|-------------------|-------------------
64
+ AMI Mix-Headset | TODO | TODO | TODO | TODO
65
+ DIHARD3 | TODO | TODO | TODO | TODO
66
+ VoxConverse | TODO | TODO | TODO | TODO
67
+
68
+
69
+ ### Segmentation
70
+
71
+ ```python
72
+ >>> from pyannote.audio.pipelines import Segmentation
73
+ >>> pipeline = Segmentation(segmentation="pyannote/Segmentation").instantiate(HYPER_PARAMETERS)
74
+ >>> seg = pipeline("audio.wav")
75
+ ```
76
+
77
+ Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
78
+ ----------------|---------|----------|-------------------|-------------------
79
+ AMI Mix-Headset | TODO | TODO | TODO | TODO
80
+ DIHARD3 | TODO | TODO | TODO | TODO
81
+ VoxConverse | TODO | TODO | TODO | TODO
82
+
83
+ ### Resegmentation
84
+
85
+ ```python
86
+ >>> from pyannote.audio.pipelines import Resegmentation
87
+ >>> pipeline = Resegmentation(segmentation="pyannote/Segmentation", diarization="baseline")
88
+ >>> assert isinstance(baseline, pyannote.core.Annotation)
89
+ >>> resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
90
+ ```
91
+
92
+ Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
93
+ ----------------|---------|----------|-------------------|-------------------
94
+ AMI Mix-Headset | TODO | TODO | TODO | TODO
95
+ DIHARD3 | TODO | TODO | TODO | TODO
96
+ VoxConverse | TODO | TODO | TODO | TODO
97
+
98
+ ## Citations
99
+
100
+
101
+ ```bibtex
102
+ @inproceedings{Bredin2020,
103
+ Title = {{pyannote.audio: neural building blocks for speaker diarization}},
104
+ Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
105
+ Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
106
+ Address = {Barcelona, Spain},
107
+ Month = {May},
108
+ Year = {2020},
109
+ }
110
+ ```
config.yaml ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ protocol: X.SpeakerDiarization.Custom
2
+ patience: 20
3
+ task:
4
+ _target_: pyannote.audio.tasks.Segmentation
5
+ duration: 5.0
6
+ warm_up: 0.0
7
+ balance: null
8
+ overlap:
9
+ probability: 0.5
10
+ snr_min: 0.0
11
+ snr_max: 10.0
12
+ weight: null
13
+ batch_size: 32
14
+ num_workers: 10
15
+ pin_memory: false
16
+ loss: bce
17
+ vad_loss: bce
18
+ model:
19
+ _target_: pyannote.audio.models.segmentation.PyanNet
20
+ sincnet:
21
+ stride: 10
22
+ lstm:
23
+ num_layers: 4
24
+ monolithic: true
25
+ dropout: 0.5
26
+ linear:
27
+ num_layers: 2
28
+ optimizer:
29
+ _target_: torch.optim.Adam
30
+ lr: 0.001
31
+ betas:
32
+ - 0.9
33
+ - 0.999
34
+ eps: 1.0e-08
35
+ weight_decay: 0
36
+ amsgrad: false
37
+ trainer:
38
+ _target_: pytorch_lightning.Trainer
39
+ accelerator: ddp
40
+ accumulate_grad_batches: 1
41
+ amp_backend: native
42
+ amp_level: O2
43
+ auto_lr_find: false
44
+ auto_scale_batch_size: false
45
+ auto_select_gpus: true
46
+ benchmark: true
47
+ check_val_every_n_epoch: 1
48
+ checkpoint_callback: true
49
+ deterministic: false
50
+ fast_dev_run: false
51
+ flush_logs_every_n_steps: 100
52
+ gpus: -1
53
+ gradient_clip_val: 0.5
54
+ limit_test_batches: 1.0
55
+ limit_train_batches: 1.0
56
+ limit_val_batches: 1.0
57
+ log_every_n_steps: 50
58
+ log_gpu_memory: null
59
+ max_epochs: 1000
60
+ max_steps: null
61
+ min_epochs: 1
62
+ min_steps: null
63
+ num_nodes: 1
64
+ num_processes: 1
65
+ num_sanity_val_steps: 2
66
+ overfit_batches: 0.0
67
+ precision: 32
68
+ prepare_data_per_node: true
69
+ process_position: 0
70
+ profiler: null
71
+ progress_bar_refresh_rate: 1
72
+ reload_dataloaders_every_epoch: false
73
+ replace_sampler_ddp: true
74
+ sync_batchnorm: false
75
+ terminate_on_nan: false
76
+ tpu_cores: null
77
+ track_grad_norm: -1
78
+ truncated_bptt_steps: null
79
+ val_check_interval: 1.0
80
+ weights_save_path: null
81
+ weights_summary: top
82
+ augmentation:
83
+ transform: Compose
84
+ params:
85
+ shuffle: false
86
+ transforms:
87
+ - transform: AddBackgroundNoise
88
+ params:
89
+ background_paths: /gpfswork/rech/eie/commun/data/background/musan
90
+ min_snr_in_db: 5.0
91
+ max_snr_in_db: 15.0
92
+ mode: per_example
93
+ p: 0.9
hparams.yaml ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ linear:
2
+ hidden_size: 128
3
+ num_layers: 2
4
+ lstm:
5
+ batch_first: true
6
+ bidirectional: true
7
+ dropout: 0.5
8
+ hidden_size: 128
9
+ monolithic: true
10
+ num_layers: 4
11
+ num_channels: 1
12
+ sample_rate: 16000
13
+ sincnet:
14
+ sample_rate: 16000
15
+ stride: 10
overrides.yaml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - protocol=X.SpeakerDiarization.Custom
2
+ - task=Segmentation
3
+ - task.batch_size=32
4
+ - task.num_workers=10
5
+ - task.duration=5.
6
+ - task.warm_up=0.
7
+ - task.loss=bce
8
+ - task.vad_loss=bce
9
+ - patience=20
10
+ - model=PyanNet
11
+ - +model.sincnet.stride=10
12
+ - +model.lstm.num_layers=4
13
+ - +model.lstm.monolithic=True
14
+ - +model.lstm.dropout=0.5
15
+ - +model.linear.num_layers=2
16
+ - optimizer=Adam
17
+ - optimizer.lr=0.001
18
+ - trainer.benchmark=True
19
+ - trainer.gradient_clip_val=0.5
20
+ - trainer.gpus=-1
21
+ - trainer.accelerator=ddp
22
+ - +augmentation=background
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7d2e72ce20167e5eb05ce163b7af9762e92ef5fec7313435b676b74b8498afe
3
+ size 17739960
tfevents.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2b33b3855ecc446b1913916d8369ede8597b66491541a6c67e5ceafc15bcdb3
3
+ size 13357699
train.log ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2021-03-19 18:29:57,529][lightning][INFO] - GPU available: True, used: True
2
+ [2021-03-19 18:29:57,531][lightning][INFO] - TPU available: None, using: 0 TPU cores
3
+ [2021-03-19 18:29:57,531][lightning][INFO] - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
4
+ [2021-03-19 18:30:08,622][lightning][INFO] - initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/4
5
+ [2021-03-19 18:32:58,993][lightning][INFO] - Set SLURM handle signals.
6
+ [2021-03-19 18:32:59,068][lightning][INFO] -
7
+ | Name | Type | Params | In sizes | Out sizes
8
+ ------------------------------------------------------------------------------------------------------------
9
+ 0 | sincnet | SincNet | 42.6 K | [32, 1, 80000] | [32, 60, 293]
10
+ 1 | lstm | LSTM | 1.4 M | [32, 293, 60] | [[32, 293, 256], [[8, 32, 128], [8, 32, 128]]]
11
+ 2 | linear | ModuleList | 49.4 K | ? | ?
12
+ 3 | classifier | Linear | 516 | [32, 293, 128] | [32, 293, 4]
13
+ 4 | activation | Sigmoid | 0 | [32, 293, 4] | [32, 293, 4]
14
+ ------------------------------------------------------------------------------------------------------------
15
+ 1.5 M Trainable params
16
+ 0 Non-trainable params
17
+ 1.5 M Total params
18
+ [2021-03-23 02:26:47,615][lightning][INFO] - bypassing sigterm