Target Speaker Extraction with WeSep
Transcribe audio to text
Generate speech from text using a reference voice