Spaces:
Sleeping
Sleeping
title: Accent Classifier + Transcriber | |
emoji: 🎙️ | |
colorFrom: indigo | |
colorTo: purple | |
sdk: gradio | |
sdk_version: "4.20.0" | |
app_file: app.py | |
pinned: false | |
# Accent Classifier + Speech Transcriber | |
This Gradio app allows you to: | |
- Upload or link to audio/video files | |
- Automatically transcribe the speech (via OpenAI Whisper) | |
- Detect the speaker's accent (28-class Wav2Vec2 model) | |
- View a top-5 ranked list of likely accents with confidence scores | |
--- | |
## How to Use | |
Option 1: Upload an audio file | |
- Supported formats: .mp3, .wav | |
Option 2: Upload a video file | |
- Supported format: .mp4 (audio will be extracted automatically) | |
Option 3: Paste a direct .mp4 video URL | |
- Must be a direct video file URL (not a webpage) | |
- Example: a file hosted on archive.org or a CDN | |
--- | |
## Not Supported | |
- Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files) | |
- Download the video manually and upload it if needed | |
--- | |
## Models Used | |
**Transcription:** | |
- [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | |
**Accent Classification:** | |
- [ylacombe/accent-classifier](https://huggingface.co/ylacombe/accent-classifier) | |
--- | |
## Running Locally | |
To set this up and run locally, follow these steps: | |
1. **Clone the repository** | |
```bash | |
git clone https://huggingface.co/spaces/usamaijaz-ai/accent-classifier | |
cd accent-classifier | |
``` | |
2. **Create a virtual environment (optional but recommended)** | |
```bash | |
python -m venv venv | |
source venv/bin/activate # On Windows: venv\Scripts\activate | |
``` | |
3. **Install the dependencies** | |
```bash | |
pip install -r requirements.txt | |
``` | |
If there’s no `requirements.txt`, use: | |
```bash | |
pip install gradio==4.20.0 transformers torch moviepy==1.0.3 requests safetensors soundfile scipy | |
``` | |
4. **Install ffmpeg** | |
- **macOS:** `brew install ffmpeg` | |
- **Ubuntu:** `sudo apt install ffmpeg` | |
- **Windows:** [Download here](https://ffmpeg.org/download.html) and add to PATH | |
5. **Run the app** | |
```bash | |
python app.py | |
``` | |
6. **Access in your browser** | |
Visit `http://localhost:7860` to use the app locally. | |
--- | |
## How It Works | |
1. Audio is extracted (if input is a video) | |
2. Audio is converted to `.wav` and resampled to 16kHz | |
3. Speech is transcribed using Whisper | |
4. Accent is classified using a Wav2Vec2 model | |
5. Output includes: | |
- Top accent prediction | |
- Confidence score | |
- Top-5 accent list | |
- Full transcription | |
--- | |