BoltzmannEntropy commited on
Commit
c184ecd
Β·
verified Β·
1 Parent(s): e3c8524

Upload 2 files

Browse files
Files changed (2) hide show
  1. Dockerfile +74 -0
  2. app.py +511 -0
Dockerfile ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dockerfile customized for deployment on HuggingFace Spaces platform
2
+
3
+ # -- The Dockerfile has been tailored specifically for use on HuggingFace.
4
+ # -- It implies that certain modifications or optimizations have been made with HuggingFace's environment in mind.
5
+ # -- It uses "HuggingFace Spaces" to be more specific about the target platform.
6
+
7
+ # FROM pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel
8
+ FROM pytorch/pytorch:2.4.0-cuda12.1-cudnn9-devel
9
+ # FOR HF
10
+
11
+ USER root
12
+
13
+ ENV DEBIAN_FRONTEND=noninteractive
14
+ RUN apt-get update && apt-get install -y \
15
+ git \
16
+ cmake \
17
+ python3 \
18
+ python3-pip \
19
+ python3-venv \
20
+ python3-dev \
21
+ python3-numpy \
22
+ gcc \
23
+ build-essential \
24
+ gfortran \
25
+ wget \
26
+ curl \
27
+ pkg-config \
28
+ software-properties-common \
29
+ zip \
30
+ && apt-get clean && rm -rf /tmp/* /var/tmp/*
31
+
32
+ RUN apt-get update && DEBIAN_FRONTEND=noninteractive \
33
+ apt-get install -y python3.10 python3-pip
34
+
35
+ RUN apt-get install -y libopenblas-base libopenmpi-dev
36
+
37
+ ENV TZ=Asia/Dubai
38
+ RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
39
+
40
+ RUN useradd -m -u 1000 user
41
+
42
+ RUN apt-get update && apt-get install -y sudo && \
43
+ echo 'user ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
44
+
45
+ USER user
46
+ ENV HOME=/home/user \
47
+ PATH=/home/user/.local/bin:$PATH
48
+
49
+ # RUN chown -R user:user $HOME/app
50
+
51
+ USER user
52
+ WORKDIR $HOME/app
53
+
54
+ RUN python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
55
+ RUN python -m pip install accelerate diffusers datasets timm flash-attn==2.6.1 gradio faster_whisper jiwer pydub
56
+
57
+
58
+ #This seems to be a must : Intel Extension for PyTorch 2.4 needs to work with PyTorch 2.4.*, but PyTorch 2.2.2 is
59
+ RUN python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
60
+ RUN python3 -m pip install -U accelerate scipy
61
+ RUN python3 -m pip install -U git+https://github.com/huggingface/transformers
62
+
63
+ WORKDIR $HOME/app
64
+ COPY --chown=user:user app.py .
65
+ COPY --chown=user:user heb.wav .
66
+ COPY --chown=user:user noise.wav .
67
+
68
+ ENV PYTHONUNBUFFERED=1 GRADIO_ALLOW_FLAGGING=never GRADIO_NUM_PORTS=1 GRADIO_SERVER_NAME=0.0.0.0 GRADIO_SERVER_PORT=7860 SYSTEM=spaces
69
+
70
+ WORKDIR $HOME/app
71
+
72
+ EXPOSE 8097 7842 8501 8000 6666 7860
73
+
74
+ CMD ["python", "app.py"]
app.py ADDED
@@ -0,0 +1,511 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from faster_whisper import WhisperModel
3
+ from pydub import AudioSegment
4
+ import os
5
+ import tempfile
6
+ import time
7
+ import torch
8
+ from pathlib import Path
9
+ import warnings
10
+ import numpy as np
11
+ import torchaudio
12
+ import scipy.io.wavfile as wavfile
13
+ from jiwer import wer, cer
14
+ import re
15
+ import string
16
+
17
+ # Suppress warnings for cleaner output
18
+ warnings.filterwarnings("ignore")
19
+
20
+ # Global variables for models
21
+ WHISPER_MODELS = {}
22
+ DEVICE = None
23
+
24
+ # Model configurations - Hebrew-focused models
25
+ AVAILABLE_WHISPER_MODELS = {
26
+ "ivrit-ai/faster-whisper-v2-d4": "Hebrew Faster-Whisper V2-D4 (Recommended)",
27
+ "ivrit-ai/faster-whisper-v2-d3": "Hebrew Faster-Whisper V2-D3",
28
+ "ivrit-ai/faster-whisper-v2-d2": "Hebrew Faster-Whisper V2-D2",
29
+ "large-v3": "OpenAI Whisper Large V3 (Multilingual)",
30
+ "large-v2": "OpenAI Whisper Large V2 (Multilingual)",
31
+ "medium": "OpenAI Whisper Medium (Multilingual)",
32
+ "small": "OpenAI Whisper Small (Multilingual)",
33
+ }
34
+
35
+ # Default audio and transcription
36
+ DEFAULT_AUDIO = "heb.wav"
37
+ DEFAULT_TRANSCRIPTION = "Χ©ΧœΧ•Χ! אנחנו נרגשים ΧœΧ”Χ¦Χ™Χ’ ΧœΧ›Χ אΧͺ Χ™Χ›Χ•ΧœΧ•Χͺ Χ”Χ“Χ™Χ‘Χ•Χ¨ Χ”Χ˜Χ‘Χ’Χ™ Χ©ΧœΧ Χ•. Χ›ΧΧŸ ΧͺΧ•Χ›ΧœΧ• ΧœΧ‘Χ™Χ™Χ Χ§Χ•Χœ, ΧœΧ™Χ¦Χ•Χ¨ Χ“Χ™ΧΧœΧ•Χ’Χ™Χ ΧžΧ¦Χ™ΧΧ•Χͺיים Χ•Χ’Χ•Χ“ Χ”Χ¨Χ‘Χ” Χ™Χ•ΧͺΧ¨. Χ’Χ¨Χ›Χ• אΧͺ Χ”ΧžΧ§Χ•ΧžΧ•Χͺ Χ”ΧœΧœΧ• Χ›Χ“Χ™ ΧœΧ”ΧͺΧ—Χ™Χœ."
38
+
39
+ # Predefined audio files
40
+ PREDEFINED_AUDIO_FILES = {
41
+ "heb.wav": {
42
+ "file": "heb.wav",
43
+ "description": "Regular quality Hebrew audio",
44
+ "transcription": "Χ©ΧœΧ•Χ! אנחנו נרגשים ΧœΧ”Χ¦Χ™Χ’ ΧœΧ›Χ אΧͺ Χ™Χ›Χ•ΧœΧ•Χͺ Χ”Χ“Χ™Χ‘Χ•Χ¨ Χ”Χ˜Χ‘Χ’Χ™ Χ©ΧœΧ Χ•. Χ›ΧΧŸ ΧͺΧ•Χ›ΧœΧ• ΧœΧ‘Χ™Χ™Χ Χ§Χ•Χœ, ΧœΧ™Χ¦Χ•Χ¨ Χ“Χ™ΧΧœΧ•Χ’Χ™Χ ΧžΧ¦Χ™ΧΧ•Χͺיים Χ•Χ’Χ•Χ“ Χ”Χ¨Χ‘Χ” Χ™Χ•ΧͺΧ¨. Χ’Χ¨Χ›Χ• אΧͺ Χ”ΧžΧ§Χ•ΧžΧ•Χͺ Χ”ΧœΧœΧ• Χ›Χ“Χ™ ΧœΧ”ΧͺΧ—Χ™Χœ."
45
+ },
46
+ "noise.wav": {
47
+ "file": "noise.wav",
48
+ "description": "Noisy Hebrew audio",
49
+ "transcription": "אז Χ›Χš, Χ§Χ¨Χ Χ•Χͺ Χ”Χ—Χ™Χ‘Χ›Χ•ΧŸ Χ”ΧΧœΧ” Χ›ΧΧ™ΧœΧ• ΧžΧ Χ‘Χ•Χͺ ΧœΧ‘Χ Χ•Χͺ ΧžΧ Χ“Χ˜ ΧœΧ›Χœ Χ”Χ‘Χ˜ΧΧ¨Χ˜-א׀ים Χ”Χ€Χ¨Χ˜Χ™Χ™Χ.."
50
+ }
51
+ }
52
+
53
+ def normalize_hebrew_text(text):
54
+ """Normalize Hebrew text for WER calculation"""
55
+ if not text:
56
+ return ""
57
+
58
+ # Remove diacritics (niqqud)
59
+ hebrew_diacritics = "".join([chr(i) for i in range(0x0591, 0x05C8)])
60
+ text = "".join(c for c in text if c not in hebrew_diacritics)
61
+
62
+ # Remove punctuation
63
+ text = re.sub(r'[^\w\s]', ' ', text)
64
+
65
+ # Remove extra whitespace and convert to lowercase
66
+ text = ' '.join(text.split()).strip().lower()
67
+
68
+ return text
69
+
70
+ def calculate_wer_cer(reference, hypothesis):
71
+ """Calculate WER and CER for Hebrew text"""
72
+ try:
73
+ # Normalize both texts
74
+ ref_normalized = normalize_hebrew_text(reference)
75
+ hyp_normalized = normalize_hebrew_text(hypothesis)
76
+
77
+ if not ref_normalized or not hyp_normalized:
78
+ return float('inf'), float('inf'), ref_normalized, hyp_normalized
79
+
80
+ # Calculate WER and CER
81
+ word_error_rate = wer(ref_normalized, hyp_normalized)
82
+ char_error_rate = cer(ref_normalized, hyp_normalized)
83
+
84
+ return word_error_rate, char_error_rate, ref_normalized, hyp_normalized
85
+
86
+ except Exception as e:
87
+ print(f"Error calculating WER/CER: {e}")
88
+ return float('inf'), float('inf'), "", ""
89
+
90
+ def initialize_whisper_model(model_id, progress=gr.Progress()):
91
+ """Initialize a specific Whisper model with progress indication"""
92
+ global WHISPER_MODELS, DEVICE
93
+
94
+ try:
95
+ # Skip if model is already loaded
96
+ if model_id in WHISPER_MODELS and WHISPER_MODELS[model_id] is not None:
97
+ print(f"βœ… Model {model_id} already loaded")
98
+ return True
99
+
100
+ # Determine device
101
+ if DEVICE is None:
102
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
103
+
104
+ compute_type = "float16" if torch.cuda.is_available() else "int8"
105
+
106
+ print(f"πŸ”§ Loading Whisper model: {model_id} on {DEVICE}")
107
+ progress(0.3, desc=f"Loading {model_id}...")
108
+
109
+ # Initialize Whisper model (faster-whisper)
110
+ WHISPER_MODELS[model_id] = WhisperModel(
111
+ model_id,
112
+ device=DEVICE,
113
+ compute_type=compute_type
114
+ )
115
+
116
+ progress(1.0, desc=f"Loaded {model_id} successfully!")
117
+ print(f"βœ… Model {model_id} initialized successfully!")
118
+ return True
119
+
120
+ except Exception as e:
121
+ print(f"❌ Error initializing model {model_id}: {str(e)}")
122
+ WHISPER_MODELS[model_id] = None
123
+ return False
124
+
125
+ def transcribe_audio_with_model(audio_file, model_id, language="he"):
126
+ """Transcribe audio using a specific Whisper model"""
127
+ try:
128
+ # Initialize model if needed
129
+ if model_id not in WHISPER_MODELS or WHISPER_MODELS[model_id] is None:
130
+ success = initialize_whisper_model(model_id)
131
+ if not success:
132
+ return "", f"Failed to load model {model_id}"
133
+
134
+ model = WHISPER_MODELS[model_id]
135
+
136
+ print(f"🎀 Transcribing with {model_id}: {Path(audio_file).name}")
137
+
138
+ # Transcribe with faster-whisper
139
+ segments, info = model.transcribe(
140
+ audio_file,
141
+ language=language,
142
+ beam_size=5,
143
+ best_of=5,
144
+ temperature=0.0
145
+ )
146
+
147
+ # Collect all segments
148
+ transcript_text = ""
149
+ for segment in segments:
150
+ transcript_text += segment.text + " "
151
+
152
+ transcript_text = transcript_text.strip()
153
+
154
+ print(f"βœ… Transcription completed with {model_id}. Length: {len(transcript_text)} characters")
155
+ return transcript_text, f"Success - Duration: {info.duration:.1f}s"
156
+
157
+ except Exception as e:
158
+ print(f"❌ Error transcribing with {model_id}: {str(e)}")
159
+ return "", f"Error: {str(e)}"
160
+
161
+ def evaluate_all_models(audio_file, reference_text, selected_models, progress=gr.Progress()):
162
+ """Evaluate all selected models and calculate WER/CER"""
163
+ if not audio_file or not reference_text.strip():
164
+ return "❌ Please provide both audio file and reference transcription", []
165
+
166
+ if not selected_models:
167
+ return "❌ Please select at least one model to evaluate", []
168
+
169
+ results = []
170
+ detailed_results = []
171
+
172
+ print(f"🎯 Starting WER evaluation with {len(selected_models)} models...")
173
+
174
+ for i, model_id in enumerate(selected_models):
175
+ progress((i + 1) / len(selected_models), desc=f"Evaluating {model_id}...")
176
+ print(f"\nπŸ”„ Evaluating model: {model_id}")
177
+
178
+ # Transcribe with current model
179
+ start_time = time.time()
180
+ transcript, status = transcribe_audio_with_model(audio_file, model_id)
181
+ transcription_time = time.time() - start_time
182
+
183
+ if transcript:
184
+ # Calculate WER and CER
185
+ word_error_rate, char_error_rate, ref_norm, hyp_norm = calculate_wer_cer(reference_text, transcript)
186
+
187
+ # Store results
188
+ result = {
189
+ 'model': model_id,
190
+ 'model_name': AVAILABLE_WHISPER_MODELS.get(model_id, model_id),
191
+ 'transcript': transcript,
192
+ 'wer': word_error_rate,
193
+ 'cer': char_error_rate,
194
+ 'time': transcription_time,
195
+ 'status': status,
196
+ 'ref_normalized': ref_norm,
197
+ 'hyp_normalized': hyp_norm
198
+ }
199
+
200
+ results.append(result)
201
+
202
+ print(f"βœ… {model_id}: WER={word_error_rate:.3f}, CER={char_error_rate:.3f}")
203
+ else:
204
+ print(f"❌ {model_id}: Transcription failed")
205
+ results.append({
206
+ 'model': model_id,
207
+ 'model_name': AVAILABLE_WHISPER_MODELS.get(model_id, model_id),
208
+ 'transcript': 'FAILED',
209
+ 'wer': float('inf'),
210
+ 'cer': float('inf'),
211
+ 'time': transcription_time,
212
+ 'status': status,
213
+ 'ref_normalized': '',
214
+ 'hyp_normalized': ''
215
+ })
216
+
217
+ # Sort results by WER (best first)
218
+ results.sort(key=lambda x: x['wer'])
219
+
220
+ # Create summary report
221
+ summary_report = "# πŸ“Š WER Evaluation Results\n\n"
222
+ summary_report += f"**Audio File:** {os.path.basename(audio_file)}\n"
223
+ summary_report += f"**Reference Text:** {reference_text[:100]}...\n"
224
+ summary_report += f"**Models Tested:** {len(selected_models)}\n"
225
+ summary_report += f"**Device:** {DEVICE}\n\n"
226
+
227
+ # Add results summary
228
+ summary_report += "## Results Summary (sorted by WER)\n\n"
229
+ for i, result in enumerate(results):
230
+ if result['wer'] == float('inf'):
231
+ wer_display = "FAILED"
232
+ cer_display = "FAILED"
233
+ else:
234
+ wer_display = f"{result['wer']:.3f} ({result['wer']*100:.1f}%)"
235
+ cer_display = f"{result['cer']:.3f} ({result['cer']*100:.1f}%)"
236
+
237
+ summary_report += f"**{i+1}. {result['model_name']}**\n"
238
+ summary_report += f"- WER: {wer_display}\n"
239
+ summary_report += f"- CER: {cer_display}\n"
240
+ summary_report += f"- Processing Time: {result['time']:.2f}s\n\n"
241
+
242
+ # Create table data for Gradio with WER column
243
+ table_data = []
244
+
245
+ # Add ground truth row
246
+ table_data.append(["Ground Truth", reference_text, "N/A", "N/A"])
247
+
248
+ # Add model results
249
+ for result in results:
250
+ if result['wer'] == float('inf'):
251
+ wer_display = "FAILED"
252
+ cer_display = "FAILED"
253
+ else:
254
+ wer_display = f"{result['wer']:.3f}"
255
+ cer_display = f"{result['cer']:.3f}"
256
+
257
+ table_data.append([
258
+ result['model_name'],
259
+ result['transcript'],
260
+ wer_display,
261
+ cer_display
262
+ ])
263
+
264
+ print("βœ… WER evaluation completed!")
265
+ return summary_report, table_data
266
+
267
+ def create_gradio_interface():
268
+ """Create and configure the Gradio interface"""
269
+
270
+ # Initialize device info
271
+ global DEVICE
272
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
273
+
274
+ status_msg = f"""βœ… Hebrew STT WER Evaluation Tool Ready!
275
+ πŸ”§ Device: {DEVICE}
276
+ πŸ“± Available Models: {len(AVAILABLE_WHISPER_MODELS)}
277
+ 🎯 Purpose: Compare WER performance across Hebrew STT models"""
278
+
279
+ # Create Gradio interface
280
+ with gr.Blocks(
281
+ title="Hebrew STT WER Evaluation",
282
+ theme=gr.themes.Soft(),
283
+ css="""
284
+ .gradio-container { max-width: 1600px !important; }
285
+ .evaluation-section {
286
+ border: 2px solid #e0e0e0;
287
+ border-radius: 10px;
288
+ padding: 15px;
289
+ margin: 10px 0;
290
+ }
291
+ """
292
+ ) as demo:
293
+
294
+ gr.Markdown("""
295
+ # πŸ“Š Hebrew STT WER Evaluation Tool
296
+
297
+ Upload an audio file and reference transcription to test the performance of different Whisper models on Hebrew speech-to-text tasks.
298
+ """)
299
+
300
+ # Status section
301
+ with gr.Row():
302
+ status_display = gr.Textbox(
303
+ label="πŸ”§ System Status",
304
+ value=status_msg,
305
+ interactive=False,
306
+ lines=4
307
+ )
308
+
309
+ # Input section
310
+ with gr.Row():
311
+ # Audio and Reference Input
312
+ with gr.Column(scale=1, elem_classes=["evaluation-section"]):
313
+ gr.Markdown("### πŸ“ Evaluation Inputs")
314
+
315
+ # Predefined audio selection
316
+ predefined_audio_dropdown = gr.Dropdown(
317
+ label="🎡 Select Predefined Audio File",
318
+ choices=[(f"{k} - {v['description']}", k) for k, v in PREDEFINED_AUDIO_FILES.items()],
319
+ value="web01.wav",
320
+ interactive=True
321
+ )
322
+
323
+ # OR upload custom audio
324
+ gr.Markdown("**OR**")
325
+
326
+ audio_input = gr.Audio(
327
+ label="🎡 Upload Custom Audio File - Upload Hebrew audio file for transcription",
328
+ type="filepath",
329
+ value=None
330
+ )
331
+
332
+ reference_text = gr.Textbox(
333
+ label="πŸ“ Reference Transcription (Ground Truth) - The correct transcription for WER calculation",
334
+ placeholder="Enter the correct transcription of the audio file...",
335
+ value=DEFAULT_TRANSCRIPTION,
336
+ lines=5
337
+ )
338
+
339
+ # Model selection
340
+ model_selection = gr.CheckboxGroup(
341
+ label="πŸ€– Select Models to Test - Choose which models to evaluate (2-4 recommended)",
342
+ choices=list(AVAILABLE_WHISPER_MODELS.keys()),
343
+ value=["ivrit-ai/faster-whisper-v2-d4", "large-v3"]
344
+ )
345
+
346
+ with gr.Row():
347
+ load_models_btn = gr.Button(
348
+ "πŸ”§ Pre-load Selected Models (Optional)",
349
+ variant="secondary"
350
+ )
351
+
352
+ evaluate_btn = gr.Button(
353
+ "🎯 Run WER Evaluation",
354
+ variant="primary"
355
+ )
356
+
357
+ # Quick info panel
358
+ with gr.Column(scale=1, elem_classes=["evaluation-section"]):
359
+ gr.Markdown("### πŸ“Š WER Evaluation Results")
360
+
361
+ gr.Markdown("""
362
+ **What is WER?**
363
+ Word Error Rate - measures transcription accuracy at word level
364
+
365
+ **How it works:**
366
+ 1. Upload Hebrew audio file
367
+ 2. Enter correct transcription
368
+ 3. Select models to test
369
+ 4. Tool transcribes with each model
370
+ 5. Calculates WER & CER for each model
371
+ 6. Ranks models by performance
372
+
373
+ **Evaluation Metrics:**
374
+ - **WER**: Word-level errors (%)
375
+ - **CER**: Character-level errors (%)
376
+ - **Processing Time**: Transcription speed
377
+
378
+ **Tips:**
379
+ - Use high-quality audio
380
+ - Ensure reference transcription is accurate
381
+ - Select 2-4 models for comparison
382
+ - Lower WER = better performance
383
+ """)
384
+
385
+ # Results section
386
+ with gr.Row():
387
+ with gr.Column(scale=1):
388
+ gr.Markdown("### πŸ“Š WER Evaluation Results")
389
+
390
+ results_output = gr.Markdown(
391
+ value="Evaluation results will appear here after running the test..."
392
+ )
393
+
394
+ results_table = gr.Dataframe(
395
+ label="Transcription Comparison",
396
+ headers=["Model", "Transcription", "WER", "CER"],
397
+ datatype=["str", "str", "str", "str"],
398
+ col_count=(4, "fixed")
399
+ )
400
+
401
+
402
+
403
+ # Event handlers
404
+ def load_predefined_audio(selected_file):
405
+ """Load predefined audio file and its transcription"""
406
+ if selected_file and selected_file in PREDEFINED_AUDIO_FILES:
407
+ audio_data = PREDEFINED_AUDIO_FILES[selected_file]
408
+ return audio_data["file"], audio_data["transcription"]
409
+ return None, DEFAULT_TRANSCRIPTION
410
+
411
+ def load_selected_models(selected_models, progress=gr.Progress()):
412
+ """Pre-load selected models"""
413
+ if not selected_models:
414
+ return "❌ No models selected"
415
+
416
+ status_msg = f"πŸ”§ Loading {len(selected_models)} models...\n\n"
417
+
418
+ for model_id in selected_models:
419
+ try:
420
+ status_msg += f"⏳ Loading {model_id}...\n"
421
+ success = initialize_whisper_model(model_id, progress)
422
+ if success:
423
+ status_msg += f"βœ… {model_id} loaded successfully\n"
424
+ else:
425
+ status_msg += f"❌ Error loading {model_id}\n"
426
+ status_msg += "\n"
427
+ except Exception as e:
428
+ status_msg += f"❌ Error loading {model_id}: {str(e)}\n\n"
429
+
430
+ loaded_count = len([m for m in selected_models if m in WHISPER_MODELS and WHISPER_MODELS[m] is not None])
431
+ status_msg += f"βœ… Model loading complete! Available: {loaded_count}/{len(selected_models)}"
432
+ return status_msg
433
+
434
+ def run_wer_evaluation(audio_file, reference, selected_models, predefined_file, progress=gr.Progress()):
435
+ """Run the complete WER evaluation"""
436
+ # Use predefined file if no custom audio is uploaded
437
+ if not audio_file and predefined_file:
438
+ audio_file = PREDEFINED_AUDIO_FILES[predefined_file]["file"]
439
+
440
+ if not audio_file:
441
+ return "❌ Please select a predefined audio file or upload a custom one", []
442
+
443
+ if not reference or not reference.strip():
444
+ return "❌ Please enter reference transcription", []
445
+
446
+ if not selected_models:
447
+ return "❌ Please select at least one model", []
448
+
449
+ # Run evaluation
450
+ results, table_data = evaluate_all_models(audio_file, reference, selected_models, progress)
451
+ return results, table_data
452
+
453
+ # Connect events
454
+ predefined_audio_dropdown.change(
455
+ fn=load_predefined_audio,
456
+ inputs=[predefined_audio_dropdown],
457
+ outputs=[audio_input, reference_text]
458
+ )
459
+
460
+ load_models_btn.click(
461
+ fn=load_selected_models,
462
+ inputs=[model_selection],
463
+ outputs=[status_display]
464
+ )
465
+
466
+ evaluate_btn.click(
467
+ fn=run_wer_evaluation,
468
+ inputs=[audio_input, reference_text, model_selection, predefined_audio_dropdown],
469
+ outputs=[results_output, results_table]
470
+ )
471
+
472
+ # Footer
473
+ gr.Markdown("""
474
+ ---
475
+ ### πŸ”§ Technical Information
476
+ - **STT Engine**: Faster-Whisper (optimized for Hebrew)
477
+ - **Evaluation Metrics**: WER (Word Error Rate) and CER (Character Error Rate)
478
+ - **Text Normalization**: Removes diacritics, punctuation, and extra whitespace
479
+ - **Purpose**: Compare performance of different transcription models on Hebrew text
480
+
481
+ ### πŸ“¦ Setup Instructions
482
+ ```bash
483
+ # Install dependencies
484
+ pip install gradio faster-whisper torch torchaudio jiwer
485
+
486
+ # For GPU support (recommended)
487
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
488
+ ```
489
+
490
+ ### πŸ“Š Output Format
491
+ The tool displays:
492
+ - Model ranking by WER
493
+ - Detailed results for each model
494
+ - Processing times
495
+ - Normalized transcription comparison
496
+ """)
497
+
498
+ return demo
499
+
500
+ # Launch the app
501
+ if __name__ == "__main__":
502
+ print("🎯 Launching Hebrew STT WER Evaluation Tool...")
503
+ demo = create_gradio_interface()
504
+ # Launch the demo
505
+ demo.launch(
506
+ share=False, # Set to True to create a public link
507
+ debug=True,
508
+ server_name="0.0.0.0",
509
+ server_port=7860,
510
+ show_error=True
511
+ )