File size: 1,788 Bytes
99f88da
5488aaa
 
 
 
99f88da
5488aaa
99f88da
 
 
 
5488aaa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: Accent Classifier + Transcriber
emoji: 🎙️
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "4.20.0"
app_file: app.py
pinned: false
---


# Accent Classifier + Speech Transcriber

This Gradio app allows you to:

- Upload or link to audio/video files
- Automatically transcribe the speech (via OpenAI Whisper)
- Detect the speaker's accent (28-class Wav2Vec2 model)
- View a top-5 ranked list of likely accents with confidence scores

---

## How to Use

Option 1: Upload an audio file
- Supported formats: .mp3, .wav

Option 2: Upload a video file
- Supported format: .mp4 (audio will be extracted automatically)

Option 3: Paste a direct .mp4 video URL
- Must be a direct video file URL (not a webpage)
- Example: a file hosted on archive.org or a CDN

---


## Not Supported

- Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
- Download the video manually and upload it if needed

---

## Models Used

Transcription:
- openai/whisper-tiny: https://huggingface.co/openai/whisper-tiny

Accent Classification:
- ylacombe/accent-classifier: https://huggingface.co/ylacombe/accent-classifier

---

## Dependencies

Handled automatically in Hugging Face Spaces.
For local testing:

pip install gradio transformers torch moviepy requests safetensors soundfile scipy

You must also install ffmpeg:

- macOS: brew install ffmpeg
- Ubuntu: sudo apt install ffmpeg
- Windows: Download from https://ffmpeg.org/

---

## How It Works

1. Audio is extracted (if input is a video)
2. Audio is converted to .wav and resampled to 16kHz
3. Speech is transcribed using Whisper
4. Accent is classified using a Wav2Vec2 model
5. Output includes:
   - Top accent prediction
   - Confidence score
   - Top-5 accent list
   - Full transcription

---