Spaces:

prathameshv07
/

Multilingual-Audio-Intelligence-System

Running

App Files Files Community

Prathamesh Sarjerao Vaidya commited on 7 days ago

Commit

5e6e4ea

1 Parent(s): 3f792e8

made some changes

Browse files

Files changed (7) hide show

DOCUMENTATION.md +9 -10
README.md +60 -88
static/imgs/banner.png +3 -0
static/imgs/demo_banner.png +3 -0
static/imgs/demo_res_summary.png +3 -0
static/imgs/demo_res_transcript_translate.png +3 -0
static/imgs/demo_res_visual.png +3 -0

DOCUMENTATION.md CHANGED Viewed

@@ -16,7 +16,7 @@ The primary objective of the Multilingual Audio Intelligence System is to revolu
 ## 3. Technologies and Tools
-- **Programming Language:** Python 3.9+
 - **Web Framework:** FastAPI with Uvicorn ASGI server for high-performance async operations
 - **Frontend Technology:** HTML5, TailwindCSS, and Vanilla JavaScript for responsive user interface
 - **Machine Learning Libraries:**
@@ -45,7 +45,7 @@ The primary objective of the Multilingual Audio Intelligence System is to revolu
   - Storage: 10GB+ available space for application, models, and processing cache
   - GPU: Optional NVIDIA GPU with 4GB+ VRAM for accelerated processing
   - Network: Stable internet connection for initial model downloading
-- **Software:** Python 3.9+, pip package manager, Docker (optional), web browser (Chrome, Firefox, Safari, Edge)
 ## 5. Setup Instructions
@@ -53,8 +53,8 @@ The primary objective of the Multilingual Audio Intelligence System is to revolu
 1. **Clone the Repository:**
    ```bash
-   git clone https://github.com/your-username/multilingual-audio-intelligence.git
-   cd multilingual-audio-intelligence
    ```
 2. **Create and Activate Conda Environment:**
@@ -98,7 +98,7 @@ The primary objective of the Multilingual Audio Intelligence System is to revolu
 ## 6. Detailed Project Structure
 ```
-multilingual-audio-intelligence/
 ├── web_app.py                      # FastAPI application with RESTful endpoints
 ├── model_preloader.py              # Intelligent model loading with progress tracking
 ├── run_fastapi.py                  # Application startup script with preloading
@@ -117,10 +117,9 @@ multilingual-audio-intelligence/
 ├── model_cache/                    # Intelligent model caching directory
 ├── uploads/                        # User audio file storage
 ├── outputs/                        # Generated results and downloads
-├── requirements.txt          # Comprehensive dependency specification
 ├── Dockerfile                      # Production-ready containerization
-├── DEPLOYMENT_GUIDE.md            # Comprehensive deployment instructions
-└── config.example.env             # Environment configuration template
 ```
 ## 6.1 Demo Mode & Sample Files
@@ -129,8 +128,8 @@ The application ships with a professional demo mode for instant showcases withou
 - Demo files are automatically downloaded at startup (if missing) into `demo_audio/` and preprocessed into `demo_results/` for blazing-fast responses.
 - Available demos:
-  - `Yuri_Kizaki.mp3` — Japanese narration about website communication
-  - `Film_Podcast.mp3` — French podcast discussing films like The Social Network
 - Static serving: demo audio is exposed at `/demo_audio/<filename>` for local preview.
 - The UI provides two selectable cards under Demo Mode; once selected, the system loads a preview and renders a waveform using HTML5 Canvas (Web Audio API) before processing.

 ## 3. Technologies and Tools
+- **Programming Language:** Python 3.8+
 - **Web Framework:** FastAPI with Uvicorn ASGI server for high-performance async operations
 - **Frontend Technology:** HTML5, TailwindCSS, and Vanilla JavaScript for responsive user interface
 - **Machine Learning Libraries:**
   - Storage: 10GB+ available space for application, models, and processing cache
   - GPU: Optional NVIDIA GPU with 4GB+ VRAM for accelerated processing
   - Network: Stable internet connection for initial model downloading
+- **Software:** Python 3.8+, pip package manager, Docker (optional), web browser (Chrome, Firefox, Safari, Edge)
 ## 5. Setup Instructions
 1. **Clone the Repository:**
    ```bash
+   git clone https://github.com/Prathameshv07/Multilingual-Audio-Intelligence-System.git
+   cd Multilingual-Audio-Intelligence-System
    ```
 2. **Create and Activate Conda Environment:**
 ## 6. Detailed Project Structure
 ```
+Multilingual-Audio-Intelligence-System/
 ├── web_app.py                      # FastAPI application with RESTful endpoints
 ├── model_preloader.py              # Intelligent model loading with progress tracking
 ├── run_fastapi.py                  # Application startup script with preloading
 ├── model_cache/                    # Intelligent model caching directory
 ├── uploads/                        # User audio file storage
 ├── outputs/                        # Generated results and downloads
+├── requirements.txt                # Comprehensive dependency specification
 ├── Dockerfile                      # Production-ready containerization
+└── config.example.env              # Environment configuration template
 ```
 ## 6.1 Demo Mode & Sample Files
 - Demo files are automatically downloaded at startup (if missing) into `demo_audio/` and preprocessed into `demo_results/` for blazing-fast responses.
 - Available demos:
+  - [Yuri_Kizaki.mp3](https://www.mitsue.co.jp/service/audio_and_video/audio_production/media/narrators_sample/yuri_kizaki/03.mp3) — Japanese narration about website communication
+  - [Film_Podcast.mp3](https://www.lightbulblanguages.co.uk/resources/audio/film-podcast.mp3) — French podcast discussing films like The Social Network
 - Static serving: demo audio is exposed at `/demo_audio/<filename>` for local preview.
 - The UI provides two selectable cards under Demo Mode; once selected, the system loads a preview and renders a waveform using HTML5 Canvas (Web Audio API) before processing.

README.md CHANGED Viewed

@@ -1,6 +1,12 @@
 # 🎵 Multilingual Audio Intelligence System
-## New Features ✨
 ### Demo Mode with Professional Audio Files
 - **Yuri Kizaki - Japanese Audio**: Professional voice message about website communication (23 seconds)
@@ -14,111 +20,81 @@
 - **Improved Transcript Display**: Color-coded confidence levels and clear translation sections
 - **Professional Audio Preview**: Audio player with waveform visualization
-### Technical Improvements
-- Automatic demo file download from original sources
-- Cached preprocessing results for instant demo response
-- Enhanced error handling for missing or corrupted demo files
-- Web Audio API integration for dynamic waveform generation
-## Quick Start
-```bash
-# Install dependencies
-pip install -r requirements.txt
-# Start the application (includes demo file setup)
-python run_fastapi.py
-# Access the application
-# http://127.0.0.1:8000
-```
-## Demo Mode Usage
-1. **Select Demo Mode**: Click the "Demo Mode" button in the header
-2. **Choose Audio File**: Select either Japanese or French demo audio
-3. **Preview**: Listen to the audio and view the waveform
-4. **Process**: Click "Process Audio" for instant results
-5. **Explore**: View transcripts, translations, and analytics
-## Full Processing Mode
-1. **Upload Audio**: Drag & drop or click to upload your audio file
-2. **Preview**: View waveform and listen to your audio
-3. **Configure**: Select model size and target language
-4. **Process**: Real-time processing with progress tracking
-5. **Download**: Export results in JSON, SRT, or TXT format
-## Features
-## System Architecture
-### Core Components
-- **FastAPI Backend** - Production-ready web framework
-- **HTML/TailwindCSS Frontend** - Clean, professional interface
-- **Audio Processing Pipeline** - Integrated ML models
-- **RESTful API** - Standardized endpoints
-### Key Features
-- **Speaker Diarization** - Identify "who spoke when"
-- **Speech Recognition** - Convert speech to text
-- **Language Detection** - Automatic language identification
-- **Neural Translation** - Multi-language translation
-- **Interactive Visualization** - Waveform analysis
-- **Multiple Export Formats** - JSON, SRT, TXT
-## Technology Stack
-### Backend
-- **FastAPI** - Modern Python web framework
-- **Uvicorn** - ASGI server
-- **PyTorch** - Deep learning framework
-- **pyannote.audio** - Speaker diarization
-- **faster-whisper** - Speech recognition
-- **Helsinki-NLP** - Neural translation
-### Frontend
-- **HTML5/CSS3** - Clean markup
-- **TailwindCSS** - Utility-first styling
-- **JavaScript (Vanilla)** - Client-side logic
-- **Plotly.js** - Interactive visualizations
-- **Font Awesome** - Professional icons
-## API Endpoints
-### Core Endpoints
-- `GET /` - Main application interface
-- `POST /api/upload` - Upload and process audio
-- `GET /api/status/{task_id}` - Check processing status
-- `GET /api/results/{task_id}` - Retrieve results
-- `GET /api/download/{task_id}/{format}` - Download outputs
-### Demo Endpoints
-- `POST /api/demo-process` - Quick demo processing
-- `GET /api/system-info` - System information
 ## File Structure
 ```
 audio_challenge/
-├── web_app.py              # FastAPI application
-├── run_fastapi.py          # Startup script
-├── requirements.txt  # Dependencies
 ├── templates/
-│   └── index.html          # Main interface
-├── src/                    # Core modules
-│   ├── main.py            # Pipeline orchestrator
-│   ├── audio_processor.py  # Audio preprocessing
-│   ├── speaker_diarizer.py # Speaker identification
 │   ├── speech_recognizer.py # ASR with language detection
-│   ├── translator.py      # Neural machine translation
-│   ├── output_formatter.py # Output generation
-│   └── utils.py           # Utility functions
-├── static/                # Static assets
-├── uploads/               # Uploaded files
-└── outputs/               # Generated outputs
 └── README.md
 ```
@@ -180,10 +156,6 @@ uvicorn web_app:app --host 0.0.0.0 --port 8000
 - Ensure all dependencies are installed
 - Check available system memory
-## License
-MIT License - See LICENSE file for details
 ## Support
 - **Documentation**: Check `/api/docs` endpoint

 # 🎵 Multilingual Audio Intelligence System
+![Multilingual Audio Intelligence System Banner](/static/imgs/banner.png)
+## Overview
+The Multilingual Audio Intelligence System is an advanced AI-powered platform that combines state-of-the-art speaker diarization, automatic speech recognition, and neural machine translation to deliver comprehensive audio analysis capabilities. This sophisticated system processes multilingual audio content, identifies individual speakers, transcribes speech with high accuracy, and provides intelligent translations across multiple languages, transforming raw audio into structured, actionable insights.
+## Features
 ### Demo Mode with Professional Audio Files
 - **Yuri Kizaki - Japanese Audio**: Professional voice message about website communication (23 seconds)
 - **Improved Transcript Display**: Color-coded confidence levels and clear translation sections
 - **Professional Audio Preview**: Audio player with waveform visualization
+### Screenshots
+#### 🎬 Demo Banner
+![Demo Banner](/static/imgs/demo_banner.png)
+#### 📝 Transcript with Translation
+![Transcript with Translation](/static/imgs/demo_res_transcript_translate.png)
+#### 📊 Visual Representation
+<p align="center">
+  <img src="static/imgs/demo_res_visual.png" alt="Visual Output"/>
+</p>
+#### 🧠 Summary Output
+![Summary Output](/static/imgs/demo_res_summary.png)
+## Installation and Quick Start
+1. **Clone the Repository:**
+   ```bash
+   git clone https://github.com/Prathameshv07/Multilingual-Audio-Intelligence-System.git
+   cd Multilingual-Audio-Intelligence-System
+   ```
+2. **Create and Activate Conda Environment:**
+   ```bash
+   conda create --name audio_challenge python=3.9
+   conda activate audio_challenge
+   ```
+3. **Install Dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. **Configure Environment Variables:**
+   ```bash
+   cp config.example.env .env
+   # Edit .env file with your HUGGINGFACE_TOKEN for accessing gated models
+   ```
+5. **Preload AI Models (Recommended):**
+   ```bash
+   python model_preloader.py
+   ```
+6. **Initialize Application:**
+   ```bash
+   python run_fastapi.py
+   ```
 ## File Structure
 ```
 audio_challenge/
+├── web_app.py               # FastAPI application
+├── run_fastapi.py           # Startup script
+├── requirements.txt         # Dependencies
 ├── templates/
+│   └── index.html           # Main interface
+├── src/                     # Core modules
+│   ├── main.py              # Pipeline orchestrator
+│   ├── audio_processor.py   # Audio preprocessing
+│   ├── speaker_diarizer.py  # Speaker identification
 │   ├── speech_recognizer.py # ASR with language detection
+│   ├── translator.py        # Neural machine translation
+│   ├── output_formatter.py  # Output generation
+│   └── utils.py             # Utility functions
+├── static/                  # Static assets
+├── uploads/                 # Uploaded files
+└── outputs/                 # Generated outputs
 └── README.md
 ```
 - Ensure all dependencies are installed
 - Check available system memory
 ## Support
 - **Documentation**: Check `/api/docs` endpoint