usamaijaz-ai commited on
Commit
b5e4ecb
·
1 Parent(s): 2233480

updated readme

Browse files
Files changed (3) hide show
  1. .gitattributes +2 -0
  2. README.md +43 -19
  3. app.py +5 -5
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.wav filter=lfs diff=lfs merge=lfs -text
37
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -9,7 +9,6 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
-
13
  # Accent Classifier + Speech Transcriber
14
 
15
  This Gradio app allows you to:
@@ -23,19 +22,18 @@ This Gradio app allows you to:
23
 
24
  ## How to Use
25
 
26
- Option 1: Upload an audio file
27
  - Supported formats: .mp3, .wav
28
 
29
- Option 2: Upload a video file
30
  - Supported format: .mp4 (audio will be extracted automatically)
31
 
32
- Option 3: Paste a direct .mp4 video URL
33
- - Must be a direct video file URL (not a webpage)
34
  - Example: a file hosted on archive.org or a CDN
35
 
36
  ---
37
 
38
-
39
  ## Not Supported
40
 
41
  - Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
@@ -45,33 +43,59 @@ Option 3: Paste a direct .mp4 video URL
45
 
46
  ## Models Used
47
 
48
- Transcription:
49
- - openai/whisper-tiny: https://huggingface.co/openai/whisper-tiny
50
 
51
- Accent Classification:
52
- - ylacombe/accent-classifier: https://huggingface.co/ylacombe/accent-classifier
53
 
54
  ---
55
 
56
- ## Dependencies
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- Handled automatically in Hugging Face Spaces.
59
- For local testing:
 
 
60
 
61
- pip install gradio transformers torch moviepy requests safetensors soundfile scipy
 
 
 
62
 
63
- You must also install ffmpeg:
 
 
 
64
 
65
- - macOS: brew install ffmpeg
66
- - Ubuntu: sudo apt install ffmpeg
67
- - Windows: Download from https://ffmpeg.org/
68
 
69
  ---
70
 
71
  ## How It Works
72
 
73
  1. Audio is extracted (if input is a video)
74
- 2. Audio is converted to .wav and resampled to 16kHz
75
  3. Speech is transcribed using Whisper
76
  4. Accent is classified using a Wav2Vec2 model
77
  5. Output includes:
 
9
  pinned: false
10
  ---
11
 
 
12
  # Accent Classifier + Speech Transcriber
13
 
14
  This Gradio app allows you to:
 
22
 
23
  ## How to Use
24
 
25
+ Option 1: Upload an audio file
26
  - Supported formats: .mp3, .wav
27
 
28
+ Option 2: Upload a video file
29
  - Supported format: .mp4 (audio will be extracted automatically)
30
 
31
+ Option 3: Paste a direct .mp4 video URL
32
+ - Must be a direct video file URL (not a webpage)
33
  - Example: a file hosted on archive.org or a CDN
34
 
35
  ---
36
 
 
37
  ## Not Supported
38
 
39
  - Loom, YouTube, Dropbox, or other webpage links (they don't serve real video files)
 
43
 
44
  ## Models Used
45
 
46
+ **Transcription:**
47
+ - [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
48
 
49
+ **Accent Classification:**
50
+ - [ylacombe/accent-classifier](https://huggingface.co/ylacombe/accent-classifier)
51
 
52
  ---
53
 
54
+ ## Running Locally
55
+
56
+ To set this up and run locally, follow these steps:
57
+
58
+ 1. **Clone the repository**
59
+ ```bash
60
+ git clone https://huggingface.co/spaces/usamaijaz-ai/accent-classifier
61
+ cd accent-classifier
62
+ ```
63
+
64
+ 2. **Create a virtual environment (optional but recommended)**
65
+ ```bash
66
+ python -m venv venv
67
+ source venv/bin/activate # On Windows: venv\Scripts\activate
68
+ ```
69
+
70
+ 3. **Install the dependencies**
71
+ ```bash
72
+ pip install -r requirements.txt
73
+ ```
74
 
75
+ If there’s no `requirements.txt`, use:
76
+ ```bash
77
+ pip install gradio==4.20.0 transformers torch moviepy==1.0.3 requests safetensors soundfile scipy
78
+ ```
79
 
80
+ 4. **Install ffmpeg**
81
+ - **macOS:** `brew install ffmpeg`
82
+ - **Ubuntu:** `sudo apt install ffmpeg`
83
+ - **Windows:** [Download here](https://ffmpeg.org/download.html) and add to PATH
84
 
85
+ 5. **Run the app**
86
+ ```bash
87
+ python app.py
88
+ ```
89
 
90
+ 6. **Access in your browser**
91
+ Visit `http://localhost:7860` to use the app locally.
 
92
 
93
  ---
94
 
95
  ## How It Works
96
 
97
  1. Audio is extracted (if input is a video)
98
+ 2. Audio is converted to `.wav` and resampled to 16kHz
99
  3. Speech is transcribed using Whisper
100
  4. Accent is classified using a Wav2Vec2 model
101
  5. Output includes:
app.py CHANGED
@@ -16,14 +16,14 @@ CONVERTED_AUDIO = "converted_audio.wav"
16
  MODEL_REPO = "ylacombe/accent-classifier"
17
 
18
  # === load local model
19
- # MODEL_DIR = "model"
20
- # model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_DIR, local_files_only=True)
21
- # feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_DIR)
22
 
23
 
24
  # # === Load models ===
25
- model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_REPO)
26
- feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_REPO)
27
  whisper = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")
28
 
29
  LABELS = [model.config.id2label[i] for i in range(len(model.config.id2label))]
 
16
  MODEL_REPO = "ylacombe/accent-classifier"
17
 
18
  # === load local model
19
+ MODEL_DIR = "model"
20
+ model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_DIR, local_files_only=True)
21
+ feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_DIR)
22
 
23
 
24
  # # === Load models ===
25
+ # model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_REPO)
26
+ # feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_REPO)
27
  whisper = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")
28
 
29
  LABELS = [model.config.id2label[i] for i in range(len(model.config.id2label))]