Transcribe audio/video to text and generate SRT subtitles
Separate audio into vocals, bass, drums, and other