Spaces:
Configuration error
Configuration error
File size: 9,907 Bytes
519d358 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
# Demucs APIs
## Quick start
Notes: Type hints have been added to all API functions. It is recommended to check them before passing parameters to a function as some arguments only support limited types (e.g. parameter `repo` of method `load_model` only support type `pathlib.Path`).
1. The first step is to import api module:
```python
import demucs.api
```
2. Then initialize the `Separator`. Parameters which will be served as default values for methods can be passed. Model should be specified.
```python
# Initialize with default parameters:
separator = demucs.api.Separator()
# Use another model and segment:
separator = demucs.api.Separator(model="mdx_extra", segment=12)
# You can also use other parameters defined
```
3. Separate it!
```python
# Separating an audio file
origin, separated = separator.separate_audio_file("file.mp3")
# Separating a loaded audio
origin, separated = separator.separate_tensor(audio)
# If you encounter an error like CUDA out of memory, you can use this to change parameters like `segment`:
separator.update_parameter(segment=smaller_segment)
```
4. Save audio
```python
# Remember to create the destination folder before calling `save_audio`
# Or you are likely to recieve `FileNotFoundError`
for file, sources in separated:
for stem, source in sources.items():
demucs.api.save_audio(source, f"{stem}_{file}", samplerate=separator.samplerate)
```
## API References
The types of each parameter and return value is not listed in this document. To know the exact type of them, please read the type hints in api.py (most modern code editors support inferring types based on type hints).
### `class Separator`
The base separator class
##### Parameters
model: Pretrained model name or signature. Default is htdemucs.
repo: Folder containing all pre-trained models for use.
segment: Length (in seconds) of each segment (only available if `split` is `True`). If not specified, will use the command line option.
shifts: If > 0, will shift in time `wav` by a random amount between 0 and 0.5 sec and apply the oppositve shift to the output. This is repeated `shifts` time and all predictions are averaged. This effectively makes the model time equivariant and improves SDR by up to 0.2 points. If not specified, will use the command line option.
split: If True, the input will be broken down into small chunks (length set by `segment`) and predictions will be performed individually on each and concatenated. Useful for model with large memory footprint like Tasnet. If not specified, will use the command line option.
overlap: The overlap between the splits. If not specified, will use the command line option.
device (torch.device, str, or None): If provided, device on which to execute the computation, otherwise `wav.device` is assumed. When `device` is different from `wav.device`, only local computations will be on `device`, while the entire tracks will be stored on `wav.device`. If not specified, will use the command line option.
jobs: Number of jobs. This can increase memory usage but will be much faster when multiple cores are available. If not specified, will use the command line option.
callback: A function will be called when the separation of a chunk starts or finished. The argument passed to the function will be a dict. For more information, please see the Callback section.
callback_arg: A dict containing private parameters to be passed to callback function. For more information, please see the Callback section.
progress: If true, show a progress bar.
##### Notes for callback
The function will be called with only one positional parameter whose type is `dict`. The `callback_arg` will be combined with information of current separation progress. The progress information will override the values in `callback_arg` if same key has been used. To abort the separation, raise an exception in `callback` which should be handled by yourself if you want your codes continue to function.
Progress information contains several keys (These keys will always exist):
- `model_idx_in_bag`: The index of the submodel in `BagOfModels`. Starts from 0.
- `shift_idx`: The index of shifts. Starts from 0.
- `segment_offset`: The offset of current segment. If the number is 441000, it doesn't mean that it is at the 441000 second of the audio, but the "frame" of the tensor.
- `state`: Could be `"start"` or `"end"`.
- `audio_length`: Length of the audio (in "frame" of the tensor).
- `models`: Count of submodels in the model.
#### `property samplerate`
A read-only property saving sample rate of the model requires. Will raise a warning if the model is not loaded and return the default value.
#### `property audio_channels`
A read-only property saving audio channels of the model requires. Will raise a warning if the model is not loaded and return the default value.
#### `property model`
A read-only property saving the model.
#### `method update_parameter()`
Update the parameters of separation.
##### Parameters
segment: Length (in seconds) of each segment (only available if `split` is `True`). If not specified, will use the command line option.
shifts: If > 0, will shift in time `wav` by a random amount between 0 and 0.5 sec and apply the oppositve shift to the output. This is repeated `shifts` time and all predictions are averaged. This effectively makes the model time equivariant and improves SDR by up to 0.2 points. If not specified, will use the command line option.
split: If True, the input will be broken down into small chunks (length set by `segment`) and predictions will be performed individually on each and concatenated. Useful for model with large memory footprint like Tasnet. If not specified, will use the command line option.
overlap: The overlap between the splits. If not specified, will use the command line option.
device (torch.device, str, or None): If provided, device on which to execute the computation, otherwise `wav.device` is assumed. When `device` is different from `wav.device`, only local computations will be on `device`, while the entire tracks will be stored on `wav.device`. If not specified, will use the command line option.
jobs: Number of jobs. This can increase memory usage but will be much faster when multiple cores are available. If not specified, will use the command line option.
callback: A function will be called when the separation of a chunk starts or finished. The argument passed to the function will be a dict. For more information, please see the Callback section.
callback_arg: A dict containing private parameters to be passed to callback function. For more information, please see the Callback section.
progress: If true, show a progress bar.
##### Notes for callback
The function will be called with only one positional parameter whose type is `dict`. The `callback_arg` will be combined with information of current separation progress. The progress information will override the values in `callback_arg` if same key has been used. To abort the separation, raise an exception in `callback` which should be handled by yourself if you want your codes continue to function.
Progress information contains several keys (These keys will always exist):
- `model_idx_in_bag`: The index of the submodel in `BagOfModels`. Starts from 0.
- `shift_idx`: The index of shifts. Starts from 0.
- `segment_offset`: The offset of current segment. If the number is 441000, it doesn't mean that it is at the 441000 second of the audio, but the "frame" of the tensor.
- `state`: Could be `"start"` or `"end"`.
- `audio_length`: Length of the audio (in "frame" of the tensor).
- `models`: Count of submodels in the model.
#### `method separate_tensor()`
Separate an audio.
##### Parameters
wav: Waveform of the audio. Should have 2 dimensions, the first is each audio channel, while the second is the waveform of each channel. e.g. `tuple(wav.shape) == (2, 884000)` means the audio has 2 channels.
sr: Sample rate of the original audio, the wave will be resampled if it doesn't match the model.
##### Returns
A tuple, whose first element is the original wave and second element is a dict, whose keys are the name of stems and values are separated waves. The original wave will have already been resampled.
##### Notes
Use this function with cautiousness. This function does not provide data verifying.
#### `method separate_audio_file()`
Separate an audio file. The method will automatically read the file.
##### Parameters
wav: Path of the file to be separated.
##### Returns
A tuple, whose first element is the original wave and second element is a dict, whose keys are the name of stems and values are separated waves. The original wave will have already been resampled.
### `function save_audio()`
Save audio file.
##### Parameters
wav: Audio to be saved
path: The file path to be saved. Ending must be one of `.mp3` and `.wav`.
samplerate: File sample rate.
bitrate: If the suffix of `path` is `.mp3`, it will be used to specify the bitrate of mp3.
clip: Clipping preventing strategy.
bits_per_sample: If the suffix of `path` is `.wav`, it will be used to specify the bit depth of wav.
as_float: If it is True and the suffix of `path` is `.wav`, then `bits_per_sample` will be set to 32 and will write the wave file with float format.
##### Returns
None
### `function list_models()`
List the available models. Please remember that not all the returned models can be successfully loaded.
##### Parameters
repo: The repo whose models are to be listed.
##### Returns
A dict with two keys ("single" for single models and "bag" for bag of models). The values are lists whose components are strs. |