whisper-large-v3-srt

Running on Zero

App Files Files Community

datxy commited on 23 days ago

Commit

0c59afa

verified ·

1 Parent(s): 1433841

Update README.md

Browse files

Files changed (1) hide show

README.md +54 -8

README.md CHANGED Viewed

@@ -1,14 +1,60 @@
 ---
-title: Whisper Large V3
-emoji: 🤫
-colorFrom: indigo
-colorTo: red
 sdk: gradio
-sdk_version: 4.37.2
 app_file: app.py
 pinned: false
-tags:
-- whisper-event
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Whisper V3 - Energy Based SRT
+emoji: 🎧
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: "4.36.1"
 app_file: app.py
 pinned: false
 ---
+# Whisper Large V3 - Energy Based Subtitle Generator
+本 Space 使用 **[openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)** 进行语音识别，
+并基于 **音频能量 (RMS → dB)** 自动检测静音段切分字幕。
+特点：
+- ✅ 自动识别长音频（分块推理）
+- ✅ 静音检测切分（非依赖标点，停顿 ≥ 0.2 秒自动分段）
+- ✅ 输出标准 `.srt` 文件（无编号，更适合编辑）
+- ✅ 兼容 **GPU / CPU / ZeroGPU** 环境，避免 AcceleratorError
+---
+## 🚀 使用方法
+1. 上传音频文件（支持 mp3, wav, flac 等常见格式）。
+2. 选择任务模式：
+   - `transcribe`：原语言转写
+   - `translate`：翻译为英文
+3. 点击提交，等待处理完成。
+4. 在下方：
+   - 预览区可查看字幕（SRT 格式，无编号）。
+   - 点击下载按钮可获取 `.srt` 文件。
+---
+## ⚙️ 参数说明
+- `SILENCE_MIN_LEN = 0.20` → 停顿 ≥ 0.2s 判定为静音段
+- `DB_DROP = 25.0` → 静音阈值：最大能量 -25dB 以下
+- `PCTL_FLOOR = 20.0` → 能量分位数阈值（避免底噪过低）
+- `MIN_SEG_DUR = 0.30` → 每段最短显示 0.3s，避免闪烁
+可根据需要在 **`app.py`** 中调节这些参数。
+---
+## 📦 依赖 (requirements.txt)
+```txt
+torch>=2.3.0
+torchaudio>=2.3.0
+transformers>=4.43.0
+accelerate>=0.32.0
+huggingface_hub>=0.23.0
+datasets>=2.19.0
+gradio>=4.36.1
+librosa>=0.10.1
+soundfile>=0.12.1
+numpy>=1.24.0
+scipy>=1.12.0