title: Multilingual Audio Intelligence System
emoji: π΅
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
short_description: AI system for multilingual transcription and translation
π΅ Multilingual Audio Intelligence System

Overview
The Multilingual Audio Intelligence System is an advanced AI-powered platform that combines state-of-the-art speaker diarization, automatic speech recognition, and neural machine translation to deliver comprehensive audio analysis capabilities. This sophisticated system processes multilingual audio content, identifies individual speakers, transcribes speech with high accuracy, and provides intelligent translations across multiple languages, transforming raw audio into structured, actionable insights.
Features
Demo Mode with Professional Audio Files
- Yuri Kizaki - Japanese Audio: Professional voice message about website communication
- French Film Podcast: Discussion about movies including Social Network and Paranormal Activity
- Smart demo file management with automatic download and preprocessing
- Instant results with cached processing for blazing-fast demonstration
Enhanced User Interface
- Audio Waveform Visualization: Real-time waveform display with HTML5 Canvas
- Interactive Demo Selection: Beautiful cards for selecting demo audio files
- Improved Transcript Display: Color-coded confidence levels and clear translation sections
- Professional Audio Preview: Audio player with waveform visualization
Screenshots
π¬ Demo Banner

π Transcript with Translation

π Visual Representation
π§ Summary Output

Demo & Documentation
- π₯ Video Preview
- π Project Documentation
Installation and Quick Start
Clone the Repository:
git clone https://github.com/Prathameshv07/Multilingual-Audio-Intelligence-System.git cd Multilingual-Audio-Intelligence-System
Create and Activate Conda Environment:
conda create --name audio_challenge python=3.9 conda activate audio_challenge
Install Dependencies:
pip install -r requirements.txt
Configure Environment Variables:
cp config.example.env .env # Edit .env file with your HUGGINGFACE_TOKEN for accessing gated models
Preload AI Models (Recommended):
python model_preloader.py
Initialize Application:
python run_fastapi.py
File Structure
Multilingual-Audio-Intelligence-System/
βββ web_app.py # FastAPI application with RESTful endpoints
βββ model_preloader.py # Intelligent model loading with progress tracking
βββ run_fastapi.py # Application startup script with preloading
βββ src/
β βββ main.py # AudioIntelligencePipeline orchestrator
β βββ audio_processor.py # Advanced audio preprocessing and normalization
β βββ speaker_diarizer.py # pyannote.audio integration for speaker identification
β βββ speech_recognizer.py # faster-whisper ASR with language detection
β βββ translator.py # Neural machine translation with multiple models
β βββ output_formatter.py # Multi-format result generation and export
β βββ utils.py # Utility functions and performance monitoring
βββ templates/
β βββ index.html # Responsive web interface with home page
βββ static/ # Static assets and client-side resources
βββ model_cache/ # Intelligent model caching directory
βββ uploads/ # User audio file storage
βββ outputs/ # Generated results and downloads
βββ requirements.txt # Comprehensive dependency specification
βββ Dockerfile # Production-ready containerization
βββ config.example.env # Environment configuration template
Configuration
Environment Variables
Create a .env
file:
HUGGINGFACE_TOKEN=hf_your_token_here # Optional, for gated models
Model Configuration
- Whisper Model: tiny/small/medium/large
- Target Language: en/es/fr/de/it/pt/zh/ja/ko/ar
- Device: auto/cpu/cuda
Supported Audio Formats
- WAV (recommended)
- MP3
- OGG
- FLAC
- M4A
Maximum file size: 100MB
Recommended duration: Under 30 minutes
Development
Local Development
python run_fastapi.py
Production Deployment
uvicorn web_app:app --host 0.0.0.0 --port 8000
Performance
- Processing Speed: 2-14x real-time (depending on model size)
- Memory Usage: Optimized with INT8 quantization
- CPU Optimized: Works without GPU
- Concurrent Processing: Async/await support
Troubleshooting
Common Issues
- Dependencies: Use
requirements.txt
for clean installation - Memory: Use smaller models (tiny/small) for limited hardware
- Audio Format: Convert to WAV if other formats fail
- Port Conflicts: Change port in
run_fastapi.py
if 8000 is occupied
Error Resolution
- Check logs in terminal output
- Verify audio file format and size
- Ensure all dependencies are installed
- Check available system memory
Support
- Documentation: Check
/api/docs
endpoint - System Info: Use the info button in the web interface
- Logs: Monitor terminal output for detailed information
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference