Spaces:

usamaijaz-ai
/

accent-classifier

Sleeping

App Files Files Community

usamaijaz-ai commited on May 9

Commit

6e7344a

1 Parent(s): c8c0038

added readme file

Browse files

Files changed (1) hide show

README.md +68 -10

README.md CHANGED Viewed

@@ -1,13 +1,71 @@
 ---
-title: Accent Classifier
-emoji: 🌖
-colorFrom: purple
-colorTo: blue
-sdk: gradio
-sdk_version: 5.29.0
-app_file: app.py
-pinned: false
-short_description: classifies the accents in an audio file
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Accent Classifier + Speech Transcriber
+This Gradio app allows you to:
+- Upload or link to audio/video files
+- Automatically transcribe the speech (via OpenAI Whisper)
+- Detect the speaker's accent (28-class Wav2Vec2 model)
+- View a top-5 ranked list of likely accents with confidence scores
+---
+## How to Use
+Option 1: Upload an audio file
+- Supported formats: .mp3, .wav
+Option 2: Upload a video file
+- Supported format: .mp4 (audio will be extracted automatically)
+Option 3: Paste a direct .mp4 video URL
+- Must be a direct video file URL (not a webpage)
+- Example: a file hosted on archive.org or a CDN
+---
+## Not Supported
+- Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
+- Download the video manually and upload it if needed
 ---
+## Models Used
+Transcription:
+- openai/whisper-tiny: https://huggingface.co/openai/whisper-tiny
+Accent Classification:
+- ylacombe/accent-classifier: https://huggingface.co/ylacombe/accent-classifier
+---
+## Dependencies
+Handled automatically in Hugging Face Spaces.
+For local testing:
+pip install gradio transformers torch moviepy requests safetensors soundfile scipy
+You must also install ffmpeg:
+- macOS: brew install ffmpeg
+- Ubuntu: sudo apt install ffmpeg
+- Windows: Download from https://ffmpeg.org/
 ---
+## How It Works
+1. Audio is extracted (if input is a video)
+2. Audio is converted to .wav and resampled to 16kHz
+3. Speech is transcribed using Whisper
+4. Accent is classified using a Wav2Vec2 model
+5. Output includes:
+   - Top accent prediction
+   - Confidence score
+   - Top-5 accent list
+   - Full transcription
+---