Spaces:

usamaijaz2824
/

en-accent-classifier

Sleeping

App Files Files Community

usamaijaz-ai commited on May 10

Commit

b5e4ecb

1 Parent(s): 2233480

updated readme

Browse files

Files changed (3) hide show

.gitattributes +2 -0
README.md +43 -19
app.py +5 -5

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.wav filter=lfs diff=lfs merge=lfs -text
+*.mp4 filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -9,7 +9,6 @@ app_file: app.py
 pinned: false
 ---
 # Accent Classifier + Speech Transcriber
 This Gradio app allows you to:
@@ -23,19 +22,18 @@ This Gradio app allows you to:
 ## How to Use
-Option 1: Upload an audio file
 - Supported formats: .mp3, .wav
-Option 2: Upload a video file
 - Supported format: .mp4 (audio will be extracted automatically)
-Option 3: Paste a direct .mp4 video URL
-- Must be a direct video file URL (not a webpage)
 - Example: a file hosted on archive.org or a CDN
 ---
 ## Not Supported
 - Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
@@ -45,33 +43,59 @@ Option 3: Paste a direct .mp4 video URL
 ## Models Used
-Transcription:
-- openai/whisper-tiny: https://huggingface.co/openai/whisper-tiny
-Accent Classification:
-- ylacombe/accent-classifier: https://huggingface.co/ylacombe/accent-classifier
 ---
-## Dependencies
-Handled automatically in Hugging Face Spaces.
-For local testing:
-pip install gradio transformers torch moviepy requests safetensors soundfile scipy
-You must also install ffmpeg:
-- macOS: brew install ffmpeg
-- Ubuntu: sudo apt install ffmpeg
-- Windows: Download from https://ffmpeg.org/
 ---
 ## How It Works
 1. Audio is extracted (if input is a video)
-2. Audio is converted to .wav and resampled to 16kHz
 3. Speech is transcribed using Whisper
 4. Accent is classified using a Wav2Vec2 model
 5. Output includes:

 pinned: false
 ---
 # Accent Classifier + Speech Transcriber
 This Gradio app allows you to:
 ## How to Use
+Option 1: Upload an audio file
 - Supported formats: .mp3, .wav
+Option 2: Upload a video file
 - Supported format: .mp4 (audio will be extracted automatically)
+Option 3: Paste a direct .mp4 video URL
+- Must be a direct video file URL (not a webpage)
 - Example: a file hosted on archive.org or a CDN
 ---
 ## Not Supported
 - Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
 ## Models Used
+**Transcription:**
+- [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
+**Accent Classification:**
+- [ylacombe/accent-classifier](https://huggingface.co/ylacombe/accent-classifier)
 ---
+## Running Locally
+To set this up and run locally, follow these steps:
+1. **Clone the repository**
+```bash
+git clone https://huggingface.co/spaces/usamaijaz-ai/accent-classifier
+cd accent-classifier
+```
+2. **Create a virtual environment (optional but recommended)**
+```bash
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+```
+3. **Install the dependencies**
+```bash
+pip install -r requirements.txt
+```
+If there’s no `requirements.txt`, use:
+```bash
+pip install gradio==4.20.0 transformers torch moviepy==1.0.3 requests safetensors soundfile scipy
+```
+4. **Install ffmpeg**
+- **macOS:** `brew install ffmpeg`
+- **Ubuntu:** `sudo apt install ffmpeg`
+- **Windows:** [Download here](https://ffmpeg.org/download.html) and add to PATH
+5. **Run the app**
+```bash
+python app.py
+```
+6. **Access in your browser**
+Visit `http://localhost:7860` to use the app locally.
 ---
 ## How It Works
 1. Audio is extracted (if input is a video)
+2. Audio is converted to `.wav` and resampled to 16kHz
 3. Speech is transcribed using Whisper
 4. Accent is classified using a Wav2Vec2 model
 5. Output includes:

app.py CHANGED Viewed

@@ -16,14 +16,14 @@ CONVERTED_AUDIO = "converted_audio.wav"
 MODEL_REPO = "ylacombe/accent-classifier"
 # === load local model
-# MODEL_DIR = "model"
-# model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_DIR, local_files_only=True)
-# feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_DIR)
 # # === Load models ===
-model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_REPO)
-feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_REPO)
 whisper = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")
 LABELS = [model.config.id2label[i] for i in range(len(model.config.id2label))]

 MODEL_REPO = "ylacombe/accent-classifier"
 # === load local model
+MODEL_DIR = "model"
+model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_DIR, local_files_only=True)
+feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_DIR)
 # # === Load models ===
+# model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_REPO)
+# feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_REPO)
 whisper = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")
 LABELS = [model.config.id2label[i] for i in range(len(model.config.id2label))]