Spaces:
Sleeping
Sleeping
Commit
·
b5e4ecb
1
Parent(s):
2233480
updated readme
Browse files- .gitattributes +2 -0
- README.md +43 -19
- app.py +5 -5
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
*.wav filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
*.mp4 filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -9,7 +9,6 @@ app_file: app.py
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
| 13 |
# Accent Classifier + Speech Transcriber
|
| 14 |
|
| 15 |
This Gradio app allows you to:
|
|
@@ -23,19 +22,18 @@ This Gradio app allows you to:
|
|
| 23 |
|
| 24 |
## How to Use
|
| 25 |
|
| 26 |
-
Option 1: Upload an audio file
|
| 27 |
- Supported formats: .mp3, .wav
|
| 28 |
|
| 29 |
-
Option 2: Upload a video file
|
| 30 |
- Supported format: .mp4 (audio will be extracted automatically)
|
| 31 |
|
| 32 |
-
Option 3: Paste a direct .mp4 video URL
|
| 33 |
-
- Must be a direct video file URL (not a webpage)
|
| 34 |
- Example: a file hosted on archive.org or a CDN
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
-
|
| 39 |
## Not Supported
|
| 40 |
|
| 41 |
- Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
|
|
@@ -45,33 +43,59 @@ Option 3: Paste a direct .mp4 video URL
|
|
| 45 |
|
| 46 |
## Models Used
|
| 47 |
|
| 48 |
-
Transcription
|
| 49 |
-
- openai/whisper-tiny
|
| 50 |
|
| 51 |
-
Accent Classification
|
| 52 |
-
- ylacombe/accent-classifier
|
| 53 |
|
| 54 |
---
|
| 55 |
|
| 56 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
- Windows: Download from https://ffmpeg.org/
|
| 68 |
|
| 69 |
---
|
| 70 |
|
| 71 |
## How It Works
|
| 72 |
|
| 73 |
1. Audio is extracted (if input is a video)
|
| 74 |
-
2. Audio is converted to
|
| 75 |
3. Speech is transcribed using Whisper
|
| 76 |
4. Accent is classified using a Wav2Vec2 model
|
| 77 |
5. Output includes:
|
|
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
|
|
|
| 12 |
# Accent Classifier + Speech Transcriber
|
| 13 |
|
| 14 |
This Gradio app allows you to:
|
|
|
|
| 22 |
|
| 23 |
## How to Use
|
| 24 |
|
| 25 |
+
Option 1: Upload an audio file
|
| 26 |
- Supported formats: .mp3, .wav
|
| 27 |
|
| 28 |
+
Option 2: Upload a video file
|
| 29 |
- Supported format: .mp4 (audio will be extracted automatically)
|
| 30 |
|
| 31 |
+
Option 3: Paste a direct .mp4 video URL
|
| 32 |
+
- Must be a direct video file URL (not a webpage)
|
| 33 |
- Example: a file hosted on archive.org or a CDN
|
| 34 |
|
| 35 |
---
|
| 36 |
|
|
|
|
| 37 |
## Not Supported
|
| 38 |
|
| 39 |
- Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
|
|
|
|
| 43 |
|
| 44 |
## Models Used
|
| 45 |
|
| 46 |
+
**Transcription:**
|
| 47 |
+
- [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
|
| 48 |
|
| 49 |
+
**Accent Classification:**
|
| 50 |
+
- [ylacombe/accent-classifier](https://huggingface.co/ylacombe/accent-classifier)
|
| 51 |
|
| 52 |
---
|
| 53 |
|
| 54 |
+
## Running Locally
|
| 55 |
+
|
| 56 |
+
To set this up and run locally, follow these steps:
|
| 57 |
+
|
| 58 |
+
1. **Clone the repository**
|
| 59 |
+
```bash
|
| 60 |
+
git clone https://huggingface.co/spaces/usamaijaz-ai/accent-classifier
|
| 61 |
+
cd accent-classifier
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
2. **Create a virtual environment (optional but recommended)**
|
| 65 |
+
```bash
|
| 66 |
+
python -m venv venv
|
| 67 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
3. **Install the dependencies**
|
| 71 |
+
```bash
|
| 72 |
+
pip install -r requirements.txt
|
| 73 |
+
```
|
| 74 |
|
| 75 |
+
If there’s no `requirements.txt`, use:
|
| 76 |
+
```bash
|
| 77 |
+
pip install gradio==4.20.0 transformers torch moviepy==1.0.3 requests safetensors soundfile scipy
|
| 78 |
+
```
|
| 79 |
|
| 80 |
+
4. **Install ffmpeg**
|
| 81 |
+
- **macOS:** `brew install ffmpeg`
|
| 82 |
+
- **Ubuntu:** `sudo apt install ffmpeg`
|
| 83 |
+
- **Windows:** [Download here](https://ffmpeg.org/download.html) and add to PATH
|
| 84 |
|
| 85 |
+
5. **Run the app**
|
| 86 |
+
```bash
|
| 87 |
+
python app.py
|
| 88 |
+
```
|
| 89 |
|
| 90 |
+
6. **Access in your browser**
|
| 91 |
+
Visit `http://localhost:7860` to use the app locally.
|
|
|
|
| 92 |
|
| 93 |
---
|
| 94 |
|
| 95 |
## How It Works
|
| 96 |
|
| 97 |
1. Audio is extracted (if input is a video)
|
| 98 |
+
2. Audio is converted to `.wav` and resampled to 16kHz
|
| 99 |
3. Speech is transcribed using Whisper
|
| 100 |
4. Accent is classified using a Wav2Vec2 model
|
| 101 |
5. Output includes:
|
app.py
CHANGED
|
@@ -16,14 +16,14 @@ CONVERTED_AUDIO = "converted_audio.wav"
|
|
| 16 |
MODEL_REPO = "ylacombe/accent-classifier"
|
| 17 |
|
| 18 |
# === load local model
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
|
| 23 |
|
| 24 |
# # === Load models ===
|
| 25 |
-
model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_REPO)
|
| 26 |
-
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_REPO)
|
| 27 |
whisper = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")
|
| 28 |
|
| 29 |
LABELS = [model.config.id2label[i] for i in range(len(model.config.id2label))]
|
|
|
|
| 16 |
MODEL_REPO = "ylacombe/accent-classifier"
|
| 17 |
|
| 18 |
# === load local model
|
| 19 |
+
MODEL_DIR = "model"
|
| 20 |
+
model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_DIR, local_files_only=True)
|
| 21 |
+
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_DIR)
|
| 22 |
|
| 23 |
|
| 24 |
# # === Load models ===
|
| 25 |
+
# model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_REPO)
|
| 26 |
+
# feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_REPO)
|
| 27 |
whisper = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")
|
| 28 |
|
| 29 |
LABELS = [model.config.id2label[i] for i in range(len(model.config.id2label))]
|