Hervé BREDIN commited on
Commit
b5fc3ba
·
1 Parent(s): 36f7729

hub: gate model

Browse files
Files changed (1) hide show
  1. README.md +49 -37
README.md CHANGED
@@ -17,31 +17,37 @@ datasets:
17
  - voxconverse
18
  license: mit
19
  inference: false
 
 
 
 
 
20
  ---
21
 
22
  # 🎹 Speaker segmentation
23
 
24
- ![Example](example.png)
25
-
26
- Model from *[End-to-end speaker segmentation for overlap-aware resegmentation](http://arxiv.org/abs/2104.04045)*,
27
- by Hervé Bredin and Antoine Laurent.
28
-
29
- [Online demo](https://huggingface.co/spaces/pyannote/pretrained-pipelines) is available as a Hugging Face Space.
30
-
31
- ## Support
32
 
33
- For commercial enquiries and scientific consulting, please contact [me](mailto:herve@niderb.fr).
34
- For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
35
 
36
  ## Usage
37
 
38
- Relies on pyannote.audio 2.0 currently in development: see [installation instructions](https://github.com/pyannote/pyannote-audio/tree/develop#installation).
 
 
 
 
 
 
 
 
 
39
 
40
  ### Voice activity detection
41
 
42
  ```python
43
  from pyannote.audio.pipelines import VoiceActivityDetection
44
- pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
45
  HYPER_PARAMETERS = {
46
  # onset/offset activation thresholds
47
  "onset": 0.5, "offset": 0.5,
@@ -59,7 +65,7 @@ vad = pipeline("audio.wav")
59
 
60
  ```python
61
  from pyannote.audio.pipelines import OverlappedSpeechDetection
62
- pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
63
  pipeline.instantiate(HYPER_PARAMETERS)
64
  osd = pipeline("audio.wav")
65
  # `osd` is a pyannote.core.Annotation instance containing overlapped speech regions
@@ -69,7 +75,7 @@ osd = pipeline("audio.wav")
69
 
70
  ```python
71
  from pyannote.audio.pipelines import Resegmentation
72
- pipeline = Resegmentation(segmentation="pyannote/segmentation",
73
  diarization="baseline")
74
  pipeline.instantiate(HYPER_PARAMETERS)
75
  resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
@@ -80,13 +86,41 @@ resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
80
 
81
  ```python
82
  from pyannote.audio import Inference
83
- inference = Inference("pyannote/segmentation")
84
  segmentation = inference("audio.wav")
85
  # `segmentation` is a pyannote.core.SlidingWindowFeature
86
  # instance containing raw segmentation scores like the
87
  # one pictured above (output)
88
  ```
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ## Reproducible research
91
 
92
  In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
@@ -112,25 +146,3 @@ In order to reproduce the results of the paper ["End-to-end speaker segmentation
112
 
113
  Expected outputs (and VBx baseline) are also provided in the `/reproducible_research` sub-directories.
114
 
115
- ## Citation
116
-
117
- ```bibtex
118
- @inproceedings{Bredin2021,
119
- Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
120
- Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
121
- Booktitle = {Proc. Interspeech 2021},
122
- Address = {Brno, Czech Republic},
123
- Month = {August},
124
- Year = {2021},
125
- ```
126
-
127
- ```bibtex
128
- @inproceedings{Bredin2020,
129
- Title = {{pyannote.audio: neural building blocks for speaker diarization}},
130
- Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
131
- Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
132
- Address = {Barcelona, Spain},
133
- Month = {May},
134
- Year = {2020},
135
- }
136
- ```
 
17
  - voxconverse
18
  license: mit
19
  inference: false
20
+ extra_gated_prompt: "The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening."
21
+ extra_gated_fields:
22
+ Company/university: text
23
+ Website: text
24
+ I plan to use this model for (task, type of audio data, etc): text
25
  ---
26
 
27
  # 🎹 Speaker segmentation
28
 
29
+ [Paper](http://arxiv.org/abs/2104.04045) | [Demo](https://huggingface.co/spaces/pyannote/pretrained-pipelines) | [Blog post](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)
 
 
 
 
 
 
 
30
 
31
+ ![Example](example.png)
 
32
 
33
  ## Usage
34
 
35
+ Relies on pyannote.audio 2.1: see [installation instructions](https://github.com/pyannote/pyannote-audio).
36
+
37
+ ```python
38
+ # 1. visit hf.co/pyannote/segmentation and accept user conditions (only if requested)
39
+ # 2. visit hf.co/settings/tokens to create an access token (only if you had to go through 1.)
40
+ # 3. instantiate pretrained model
41
+ from pyannote.audio import model
42
+ model = Model.from_pretrained("pyannote/segmentation",
43
+ use_auth_token="ACCESS_TOKEN_GOES_HERE")
44
+ ```
45
 
46
  ### Voice activity detection
47
 
48
  ```python
49
  from pyannote.audio.pipelines import VoiceActivityDetection
50
+ pipeline = VoiceActivityDetection(segmentation=model)
51
  HYPER_PARAMETERS = {
52
  # onset/offset activation thresholds
53
  "onset": 0.5, "offset": 0.5,
 
65
 
66
  ```python
67
  from pyannote.audio.pipelines import OverlappedSpeechDetection
68
+ pipeline = OverlappedSpeechDetection(segmentation=model)
69
  pipeline.instantiate(HYPER_PARAMETERS)
70
  osd = pipeline("audio.wav")
71
  # `osd` is a pyannote.core.Annotation instance containing overlapped speech regions
 
75
 
76
  ```python
77
  from pyannote.audio.pipelines import Resegmentation
78
+ pipeline = Resegmentation(segmentation=model,
79
  diarization="baseline")
80
  pipeline.instantiate(HYPER_PARAMETERS)
81
  resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
 
86
 
87
  ```python
88
  from pyannote.audio import Inference
89
+ inference = Inference(model)
90
  segmentation = inference("audio.wav")
91
  # `segmentation` is a pyannote.core.SlidingWindowFeature
92
  # instance containing raw segmentation scores like the
93
  # one pictured above (output)
94
  ```
95
 
96
+ ## Support
97
+
98
+ For commercial enquiries and scientific consulting, please contact [me](mailto:[email protected]).
99
+ For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
100
+
101
+ ## Citation
102
+
103
+ ```bibtex
104
+ @inproceedings{Bredin2021,
105
+ Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
106
+ Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
107
+ Booktitle = {Proc. Interspeech 2021},
108
+ Address = {Brno, Czech Republic},
109
+ Month = {August},
110
+ Year = {2021},
111
+ ```
112
+
113
+ ```bibtex
114
+ @inproceedings{Bredin2020,
115
+ Title = {{pyannote.audio: neural building blocks for speaker diarization}},
116
+ Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
117
+ Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
118
+ Address = {Barcelona, Spain},
119
+ Month = {May},
120
+ Year = {2020},
121
+ }
122
+ ```
123
+
124
  ## Reproducible research
125
 
126
  In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
 
146
 
147
  Expected outputs (and VBx baseline) are also provided in the `/reproducible_research` sub-directories.
148