File size: 6,233 Bytes
cb974bb 3f792e8 b899c60 5e6e4ea 3f792e8 85a395c 3f792e8 5e6e4ea 3f792e8 5e6e4ea 3f792e8 cb974bb 3f792e8 5e6e4ea 3f792e8 cb974bb 3f792e8 5e6e4ea 3f792e8 5e6e4ea 3f792e8 5e6e4ea 3f792e8 cb974bb 3f792e8 8c5e398 b899c60 8c5e398 5e6e4ea 3f792e8 5e6e4ea 3f792e8 5e6e4ea 3f792e8 5e6e4ea 3f792e8 5e6e4ea 3f792e8 5e6e4ea 3f792e8 5e6e4ea 3f792e8 8c5e398 3f792e8 8c5e398 3f792e8 cb974bb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
---
title: Multilingual Audio Intelligence System
emoji: π΅
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
short_description: AI system for multilingual transcription and translation
---
# π΅ Multilingual Audio Intelligence System
<img src="static/imgs/banner.png" alt="Multilingual Audio Intelligence System Banner"/>
## Overview
The Multilingual Audio Intelligence System is an advanced AI-powered platform that combines state-of-the-art speaker diarization, automatic speech recognition, and neural machine translation to deliver comprehensive audio analysis capabilities. This sophisticated system processes multilingual audio content, identifies individual speakers, transcribes speech with high accuracy, and provides intelligent translations across multiple languages, transforming raw audio into structured, actionable insights.
## Features
### Demo Mode with Professional Audio Files
- **Yuri Kizaki - Japanese Audio**: Professional voice message about website communication
- **French Film Podcast**: Discussion about movies including Social Network and Paranormal Activity
- Smart demo file management with automatic download and preprocessing
- Instant results with cached processing for blazing-fast demonstration
### Enhanced User Interface
- **Audio Waveform Visualization**: Real-time waveform display with HTML5 Canvas
- **Interactive Demo Selection**: Beautiful cards for selecting demo audio files
- **Improved Transcript Display**: Color-coded confidence levels and clear translation sections
- **Professional Audio Preview**: Audio player with waveform visualization
### Screenshots
#### π¬ Demo Banner
<img src="static/imgs/demo_banner.png" alt="Demo Banner"/>
#### π Transcript with Translation
<img src="static/imgs/demo_res_transcript_translate.png" alt="Transcript with Translation"/>
#### π Visual Representation
<p align="center">
<img src="static/imgs/demo_res_visual.png" alt="Visual Output"/>
</p>
#### π§ Summary Output
<img src="static/imgs/demo_res_summary.png" alt="Summary Output"/>
## Demo & Documentation
- π₯ [Video Preview](https://drive.google.com/file/d/1dfYM5p9cKGw0C5RBvmyN6DUWgnEZk56M/view)
- π [Project Documentation](DOCUMENTATION.md)
## Installation and Quick Start
1. **Clone the Repository:**
```bash
git clone https://github.com/Prathameshv07/Multilingual-Audio-Intelligence-System.git
cd Multilingual-Audio-Intelligence-System
```
2. **Create and Activate Conda Environment:**
```bash
conda create --name audio_challenge python=3.9
conda activate audio_challenge
```
3. **Install Dependencies:**
```bash
pip install -r requirements.txt
```
4. **Configure Environment Variables:**
```bash
cp config.example.env .env
# Edit .env file with your HUGGINGFACE_TOKEN for accessing gated models
```
5. **Preload AI Models (Recommended):**
```bash
python model_preloader.py
```
6. **Initialize Application:**
```bash
python run_fastapi.py
```
## File Structure
```
Multilingual-Audio-Intelligence-System/
βββ web_app.py # FastAPI application with RESTful endpoints
βββ model_preloader.py # Intelligent model loading with progress tracking
βββ run_fastapi.py # Application startup script with preloading
βββ src/
β βββ main.py # AudioIntelligencePipeline orchestrator
β βββ audio_processor.py # Advanced audio preprocessing and normalization
β βββ speaker_diarizer.py # pyannote.audio integration for speaker identification
β βββ speech_recognizer.py # faster-whisper ASR with language detection
β βββ translator.py # Neural machine translation with multiple models
β βββ output_formatter.py # Multi-format result generation and export
β βββ utils.py # Utility functions and performance monitoring
βββ templates/
β βββ index.html # Responsive web interface with home page
βββ static/ # Static assets and client-side resources
βββ model_cache/ # Intelligent model caching directory
βββ uploads/ # User audio file storage
βββ outputs/ # Generated results and downloads
βββ requirements.txt # Comprehensive dependency specification
βββ Dockerfile # Production-ready containerization
βββ config.example.env # Environment configuration template
```
## Configuration
### Environment Variables
Create a `.env` file:
```env
HUGGINGFACE_TOKEN=hf_your_token_here # Optional, for gated models
```
### Model Configuration
- **Whisper Model**: tiny/small/medium/large
- **Target Language**: en/es/fr/de/it/pt/zh/ja/ko/ar
- **Device**: auto/cpu/cuda
## Supported Audio Formats
- WAV (recommended)
- MP3
- OGG
- FLAC
- M4A
**Maximum file size**: 100MB
**Recommended duration**: Under 30 minutes
## Development
### Local Development
```bash
python run_fastapi.py
```
### Production Deployment
```bash
uvicorn web_app:app --host 0.0.0.0 --port 8000
```
## Performance
- **Processing Speed**: 2-14x real-time (depending on model size)
- **Memory Usage**: Optimized with INT8 quantization
- **CPU Optimized**: Works without GPU
- **Concurrent Processing**: Async/await support
## Troubleshooting
### Common Issues
1. **Dependencies**: Use `requirements.txt` for clean installation
2. **Memory**: Use smaller models (tiny/small) for limited hardware
3. **Audio Format**: Convert to WAV if other formats fail
4. **Port Conflicts**: Change port in `run_fastapi.py` if 8000 is occupied
### Error Resolution
- Check logs in terminal output
- Verify audio file format and size
- Ensure all dependencies are installed
- Check available system memory
## Support
- **Documentation**: Check `/api/docs` endpoint
- **System Info**: Use the info button in the web interface
- **Logs**: Monitor terminal output for detailed information
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |