Spaces:
Sleeping
Sleeping
File size: 3,104 Bytes
66fb3d7 121e197 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
title: Transcription
emoji: π
colorFrom: yellow
colorTo: pink
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: false
short_description: This tool is intended to help transcribing interviews.
---
# Audio Transcription App
A Gradio-based web application for transcribing audio files (MP3 or M4A) using OpenAI's Whisper model. Perfect for transcribing interviews and long audio recordings with features like silence removal and audio chunking.
## Features
- **Multiple Audio File Support**: Process multiple MP3 or M4A files simultaneously
- **Silence Removal**: Option to remove silence from audio to reduce processing time and improve accuracy
- **Audio Chunking**: Split long audio files into manageable chunks for better processing
- **Multiple Language Support**: Supports German (de), English (en), French (fr), Spanish (es), and Italian (it)
- **Multiple Whisper Models**: Choose from various Whisper model sizes (tiny to large-v3-turbo) based on your needs
- **Detailed Output**: Get both full transcriptions and segment-wise transcriptions with timestamps
- **Download Results**: All processed files and transcripts are provided in a convenient ZIP file
## Setup
1. Clone the repository
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Make sure you have ffmpeg installed on your system
## Usage
1. Run the application:
```bash
python app.py
```
2. Open the provided local URL in your web browser
3. Upload your audio file(s)
4. Configure the settings:
- Enable/disable silence removal
- Enable/disable audio chunking
- Select the Whisper model size
- Choose the target language
5. Click "Process" to start transcription
6. View the results and download the ZIP file containing all processed files
## Settings
### Silence Removal
- **Minimum Silence Length**: 100-2000ms (default: 500ms)
- **Silence Threshold**: -70 to -30dB (default: -50dB)
### Chunking
- **Chunk Duration**: 60-3600 seconds (default: 600 seconds/10 minutes)
- **FFmpeg Path**: Path to ffmpeg executable (default: "ffmpeg")
### Transcription
- **Model Size**: Choose from tiny, base, small, medium, large, large-v2, large-v3, turbo, or large-v3-turbo
- **Language**: German (de), English (en), French (fr), Spanish (es), Italian (it)
## Output
- **Full Transcription**: Complete text of the audio file
- **Segmented Transcription**: Text segments with timestamps
- **ZIP File**: Contains:
- Processed audio files
- Individual transcript files
- Combined transcript file
## Deployment on Hugging Face Spaces
1. Create a new Space on Hugging Face
2. Choose "Gradio" as the SDK
3. Upload the following files:
- app.py
- requirements.txt
4. The app will automatically deploy and be available at your Space's URL
## Requirements
- Python 3.7+
- ffmpeg
- See requirements.txt for Python package dependencies
## License
This project is open source and available under the MIT License.
## Acknowledgments
- [OpenAI Whisper](https://github.com/openai/whisper)
- [Gradio](https://gradio.app/)
- [FFmpeg](https://ffmpeg.org/) |