HervΓ© Bredin
commited on
Commit
Β·
db94671
1
Parent(s):
e900ca9
feat: rename /paper to /reproducible_research
Browse files- README.md +23 -41
- {paper β reproducible_research}/dihard3_custom_split/development.txt +0 -0
- {paper β reproducible_research}/dihard3_custom_split/train.txt +0 -0
- {paper β reproducible_research}/expected_outputs/osd/AMI.development.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/osd/AMI.test.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/osd/DIHARD.development.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/osd/DIHARD.test.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/osd/VoxConverse.development.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/osd/VoxConverse.test.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/rsg/AMI.development.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/rsg/AMI.test.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/rsg/DIHARD.development.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/rsg/DIHARD.test.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/rsg/VoxConverse.development.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/vad/AMI.development.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/vad/AMI.test.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/vad/DIHARD.development.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/vad/DIHARD.test.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/vad/VoxConverse.development.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/vad/VoxConverse.test.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/vbx/AMI.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/vbx/DIHARD.rttm +0 -0
- {paper β reproducible_research}/expected_outputs/vbx/VoxConverse.rttm +0 -0
- {paper β reproducible_research}/report.pdf +0 -0
README.md
CHANGED
@@ -19,13 +19,9 @@ inference: false
|
|
19 |
|
20 |
# pyannote.audio // speaker segmentation
|
21 |
|
22 |
-
This model is described in the technical report *[End-to-end speaker segmentation for overlap-aware resegmentation](paper/report.pdf)*, by HervΓ© Bredin and Antoine Laurent.
|
23 |
-
|
24 |

|
25 |
|
26 |
-
|
27 |
-
|
28 |
-
If you use this model for academic research, please consider citing the `pyannote.audio` library:
|
29 |
|
30 |
```bibtex
|
31 |
@inproceedings{Bredin2020,
|
@@ -40,7 +36,8 @@ If you use this model for academic research, please consider citing the `pyannot
|
|
40 |
|
41 |
## Support
|
42 |
|
43 |
-
|
|
|
44 |
|
45 |
## Requirements
|
46 |
|
@@ -90,16 +87,6 @@ pipeline.instantiate(HYPER_PARAMETERS)
|
|
90 |
vad = pipeline("audio.wav")
|
91 |
```
|
92 |
|
93 |
-
In order to reproduce results of the [technical report](paper/report.pdf), one should use the following hyper-parameter values:
|
94 |
-
|
95 |
-
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
96 |
-
----------------|---------|----------|-------------------|-------------------
|
97 |
-
AMI Mix-Headset | 0.851 | 0.430 | 0.115 | 0.146
|
98 |
-
DIHARD3 | 0.855 | 0.292 | 0.036 | 0.001
|
99 |
-
VoxConverse | 0.883 | 0.688 | 0.106 | 0.526
|
100 |
-
|
101 |
-
We also provide the [expected output](tree/main/paper/expected_outputs/vad) on those three datasets in RTTM format.
|
102 |
-
|
103 |
### Overlapped speech detection
|
104 |
|
105 |
```python
|
@@ -109,16 +96,6 @@ pipeline.instantiate(HYPER_PARAMETERS)
|
|
109 |
osd = pipeline("audio.wav")
|
110 |
```
|
111 |
|
112 |
-
In order to reproduce results of the [technical report](paper/report.pdf), one should use the following hyper-parameter values:
|
113 |
-
|
114 |
-
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
115 |
-
----------------|---------|----------|-------------------|-------------------
|
116 |
-
AMI Mix-Headset | 0.552 | 0.311 | 0.131 | 0.180
|
117 |
-
DIHARD3 | 0.564 | 0.264 | 0.158 | 0.080
|
118 |
-
VoxConverse | 0.617 | 0.387 | 0.367 | 0.334
|
119 |
-
|
120 |
-
We also provide the [expected output](tree/main/paper/expected_outputs/osd) on those three datasets in RTTM format.
|
121 |
-
|
122 |
### Resegmentation
|
123 |
|
124 |
```python
|
@@ -126,27 +103,32 @@ from pyannote.audio.pipelines import Resegmentation
|
|
126 |
pipeline = Resegmentation(segmentation="pyannote/segmentation",
|
127 |
diarization="baseline")
|
128 |
pipeline.instantiate(HYPER_PARAMETERS)
|
|
|
|
|
129 |
```
|
130 |
|
131 |
-
|
|
|
|
|
|
|
132 |
|
133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
134 |
----------------|---------|----------|-------------------|-------------------
|
135 |
AMI Mix-Headset | 0.542 | 0.527 | 0.044 | 0.705
|
136 |
DIHARD3 | 0.592 | 0.489 | 0.163 | 0.182
|
137 |
VoxConverse | 0.537 | 0.724 | 0.410 | 0.563
|
138 |
|
139 |
-
|
140 |
-
|
141 |
-
[VBx RTTM files](tree/main/paper/expected_outputs/vbx) are also provided in this repository for convenience:
|
142 |
-
|
143 |
-
```python
|
144 |
-
from pyannote.database.utils import load_rttm
|
145 |
-
vbx = load_rttm("paper/expected_outputs/vbx/DIHARD.rttm")
|
146 |
-
resegmented_vbx = pipeline({"audio": "DH_EVAL_000.wav",
|
147 |
-
"baseline": vbx["DH_EVAL_000"]})
|
148 |
-
```
|
149 |
-
|
150 |
-
|
151 |
-
We also provide the [expected output](tree/main/paper/expected_outputs/rsg) on those three datasets in RTTM format.
|
152 |
|
|
|
19 |
|
20 |
# pyannote.audio // speaker segmentation
|
21 |
|
|
|
|
|
22 |

|
23 |
|
24 |
+
Model from *[End-to-end speaker segmentation for overlap-aware resegmentation](reproducible_research/report.pdf)*, by HervΓ© Bredin and Antoine Laurent.
|
|
|
|
|
25 |
|
26 |
```bibtex
|
27 |
@inproceedings{Bredin2020,
|
|
|
36 |
|
37 |
## Support
|
38 |
|
39 |
+
For commercial enquiries and scientific consulting, please contact [me](mailto:[email protected]).
|
40 |
+
For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
|
41 |
|
42 |
## Requirements
|
43 |
|
|
|
87 |
vad = pipeline("audio.wav")
|
88 |
```
|
89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
### Overlapped speech detection
|
91 |
|
92 |
```python
|
|
|
96 |
osd = pipeline("audio.wav")
|
97 |
```
|
98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
### Resegmentation
|
100 |
|
101 |
```python
|
|
|
103 |
pipeline = Resegmentation(segmentation="pyannote/segmentation",
|
104 |
diarization="baseline")
|
105 |
pipeline.instantiate(HYPER_PARAMETERS)
|
106 |
+
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
|
107 |
+
# where `baseline` should be provided as a pyannote.core.Annotation instance
|
108 |
```
|
109 |
|
110 |
+
## Reproducible research
|
111 |
+
|
112 |
+
In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
|
113 |
+
"](reproducible_research/report.pdf), use the following hyper-parameters:
|
114 |
|
115 |
+
Voice activity detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
116 |
+
----------------|---------|----------|-------------------|-------------------
|
117 |
+
AMI Mix-Headset | 0.851 | 0.430 | 0.115 | 0.146
|
118 |
+
DIHARD3 | 0.855 | 0.292 | 0.036 | 0.001
|
119 |
+
VoxConverse | 0.883 | 0.688 | 0.106 | 0.526
|
120 |
+
|
121 |
+
Overlapped speech detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
122 |
+
----------------|---------|----------|-------------------|-------------------
|
123 |
+
AMI Mix-Headset | 0.552 | 0.311 | 0.131 | 0.180
|
124 |
+
DIHARD3 | 0.564 | 0.264 | 0.158 | 0.080
|
125 |
+
VoxConverse | 0.617 | 0.387 | 0.367 | 0.334
|
126 |
+
|
127 |
+
VBx resegmentation | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
128 |
----------------|---------|----------|-------------------|-------------------
|
129 |
AMI Mix-Headset | 0.542 | 0.527 | 0.044 | 0.705
|
130 |
DIHARD3 | 0.592 | 0.489 | 0.163 | 0.182
|
131 |
VoxConverse | 0.537 | 0.724 | 0.410 | 0.563
|
132 |
|
133 |
+
Expected outputs (and VBx baseline) are also provided in the `/reproducible_research` sub-directories.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
134 |
|
{paper β reproducible_research}/dihard3_custom_split/development.txt
RENAMED
File without changes
|
{paper β reproducible_research}/dihard3_custom_split/train.txt
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/osd/AMI.development.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/osd/AMI.test.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/osd/DIHARD.development.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/osd/DIHARD.test.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/osd/VoxConverse.development.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/osd/VoxConverse.test.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/rsg/AMI.development.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/rsg/AMI.test.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/rsg/DIHARD.development.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/rsg/DIHARD.test.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/rsg/VoxConverse.development.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/vad/AMI.development.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/vad/AMI.test.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/vad/DIHARD.development.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/vad/DIHARD.test.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/vad/VoxConverse.development.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/vad/VoxConverse.test.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/vbx/AMI.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/vbx/DIHARD.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/expected_outputs/vbx/VoxConverse.rttm
RENAMED
File without changes
|
{paper β reproducible_research}/report.pdf
RENAMED
File without changes
|