File size: 2,424 Bytes
2005985
105a5a4
 
 
 
2005985
105a5a4
2005985
 
 
 
105a5a4
 
 
 
 
 
 
 
 
 
 
 
 
b5e4ecb
105a5a4
 
b5e4ecb
105a5a4
 
b5e4ecb
 
105a5a4
 
 
 
 
 
 
 
 
 
 
 
 
b5e4ecb
 
105a5a4
b5e4ecb
 
105a5a4
 
 
b5e4ecb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105a5a4
b5e4ecb
 
 
 
105a5a4
b5e4ecb
 
 
 
105a5a4
b5e4ecb
 
 
 
105a5a4
b5e4ecb
 
105a5a4
 
 
 
 
 
b5e4ecb
105a5a4
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: Accent Classifier + Transcriber
emoji: 🎙️
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "4.20.0"
app_file: app.py
pinned: false
---

# Accent Classifier + Speech Transcriber

This Gradio app allows you to:

- Upload or link to audio/video files
- Automatically transcribe the speech (via OpenAI Whisper)
- Detect the speaker's accent (28-class Wav2Vec2 model)
- View a top-5 ranked list of likely accents with confidence scores

---

## How to Use

Option 1: Upload an audio file  
- Supported formats: .mp3, .wav

Option 2: Upload a video file  
- Supported format: .mp4 (audio will be extracted automatically)

Option 3: Paste a direct .mp4 video URL  
- Must be a direct video file URL (not a webpage)  
- Example: a file hosted on archive.org or a CDN

---

## Not Supported

- Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
- Download the video manually and upload it if needed

---

## Models Used

**Transcription:**  
- [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)

**Accent Classification:**  
- [ylacombe/accent-classifier](https://huggingface.co/ylacombe/accent-classifier)

---

## Running Locally

To set this up and run locally, follow these steps:

1. **Clone the repository**  
```bash
git clone https://huggingface.co/spaces/usamaijaz-ai/accent-classifier
cd accent-classifier
```

2. **Create a virtual environment (optional but recommended)**  
```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

3. **Install the dependencies**  
```bash
pip install -r requirements.txt
```

If there’s no `requirements.txt`, use:
```bash
pip install gradio==4.20.0 transformers torch moviepy==1.0.3 requests safetensors soundfile scipy
```

4. **Install ffmpeg**  
- **macOS:** `brew install ffmpeg`  
- **Ubuntu:** `sudo apt install ffmpeg`  
- **Windows:** [Download here](https://ffmpeg.org/download.html) and add to PATH

5. **Run the app**  
```bash
python app.py
```

6. **Access in your browser**  
Visit `http://localhost:7860` to use the app locally.

---

## How It Works

1. Audio is extracted (if input is a video)
2. Audio is converted to `.wav` and resampled to 16kHz
3. Speech is transcribed using Whisper
4. Accent is classified using a Wav2Vec2 model
5. Output includes:
   - Top accent prediction
   - Confidence score
   - Top-5 accent list
   - Full transcription

---