usamaijaz-ai's picture
updated readme
b5e4ecb
---
title: Accent Classifier + Transcriber
emoji: 🎙️
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "4.20.0"
app_file: app.py
pinned: false
---
# Accent Classifier + Speech Transcriber
This Gradio app allows you to:
- Upload or link to audio/video files
- Automatically transcribe the speech (via OpenAI Whisper)
- Detect the speaker's accent (28-class Wav2Vec2 model)
- View a top-5 ranked list of likely accents with confidence scores
---
## How to Use
Option 1: Upload an audio file
- Supported formats: .mp3, .wav
Option 2: Upload a video file
- Supported format: .mp4 (audio will be extracted automatically)
Option 3: Paste a direct .mp4 video URL
- Must be a direct video file URL (not a webpage)
- Example: a file hosted on archive.org or a CDN
---
## Not Supported
- Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
- Download the video manually and upload it if needed
---
## Models Used
**Transcription:**
- [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
**Accent Classification:**
- [ylacombe/accent-classifier](https://huggingface.co/ylacombe/accent-classifier)
---
## Running Locally
To set this up and run locally, follow these steps:
1. **Clone the repository**
```bash
git clone https://huggingface.co/spaces/usamaijaz-ai/accent-classifier
cd accent-classifier
```
2. **Create a virtual environment (optional but recommended)**
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. **Install the dependencies**
```bash
pip install -r requirements.txt
```
If there’s no `requirements.txt`, use:
```bash
pip install gradio==4.20.0 transformers torch moviepy==1.0.3 requests safetensors soundfile scipy
```
4. **Install ffmpeg**
- **macOS:** `brew install ffmpeg`
- **Ubuntu:** `sudo apt install ffmpeg`
- **Windows:** [Download here](https://ffmpeg.org/download.html) and add to PATH
5. **Run the app**
```bash
python app.py
```
6. **Access in your browser**
Visit `http://localhost:7860` to use the app locally.
---
## How It Works
1. Audio is extracted (if input is a video)
2. Audio is converted to `.wav` and resampled to 16kHz
3. Speech is transcribed using Whisper
4. Accent is classified using a Wav2Vec2 model
5. Output includes:
- Top accent prediction
- Confidence score
- Top-5 accent list
- Full transcription
---