Spaces:
Runtime error
Runtime error
metadata
title: MMS Translation API
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
suggested_hardware: t4-small
Translations API
A simple Flask API for translation services.
Getting Started
Development with Docker
- Build and run the development container:
docker compose up translations
The API will be available at http://localhost:5001
Available Endpoints
GET /
- Root endpoint with API informationGET /health
- Health check endpointPOST /transcribe
- Audio transcription with forced alignment using MMS modelPOST /align
- Forced alignment for audio with provided transcriptionGET /hello
- Simple hello world endpointPOST /echo
- Echo back request dataGET /version
- API version information
Environment Variables
API_LOG_LEVEL
- Set logging level (DEBUG, INFO, WARNING, ERROR)
Testing
The API includes basic endpoints for testing:
# Health check
curl http://localhost:5001/health
# Hello world
curl http://localhost:5001/hello
# Echo test
curl -X POST http://localhost:5001/echo \
-H "Content-Type: application/json" \
-d '{"test": "data"}'
# Audio transcription (requires audio file)
curl -X POST http://localhost:5001/transcribe \
-F "audio=@path/to/your/audio.wav"
# Forced alignment (requires audio file + transcription text)
curl -X POST http://localhost:5001/align \
-F "audio=@path/to/your/audio.wav" \
-F "transcription=Hello world this is a test"
Model Setup
Before running the API, you need to download the MMS model files:
# Create models directory
mkdir -p server/models
# Download MMS model checkpoint (7B parameters, ~14GB)
wget https://dl.fbaipublicfiles.com/mms/mms_XRI.pt -O server/models/mms_XRI.pt
# Download tokenizer model (~6MB)
wget https://dl.fbaipublicfiles.com/mms/mms_1143_langs_tokenizer_spm.model -O server/models/mms_1143_langs_tokenizer_spm.model
Note: The model files are large (especially mms_XRI.pt
at ~14GB) and are excluded from git via .gitignore
. Make sure you have sufficient disk space and a stable internet connection for the download.
Project Structure
translations/
βββ Dockerfile
βββ docker-compose.yaml
βββ requirements.txt
βββ README.md
βββ server/
βββ server.py # Main Flask application with model loading
βββ model.py # MMS model implementation and inference
βββ translations_blueprint.py # API routes including transcription
βββ env_vars.py # Environment configuration
βββ run.sh # Production startup script
βββ run_tests.sh # Test runner script
βββ models/ # Model files directory (gitignored)
βββ .gitignore # Ignores model files from git
βββ mms_XRI.pt # MMS model checkpoint (~14GB)
βββ mms_1143_langs_tokenizer_spm.model # Tokenizer model (~6MB)
Key Components
- MMS Model: Meta's Massively Multilingual Speech model for audio transcription
- Forced Alignment: Timestamp alignment between transcription and audio
- GPU Support: CUDA-enabled inference with NVIDIA Container Toolkit
- Singleton Pattern: Model loaded once at startup to prevent GPU memory issues
- Audio Processing: Librosa-based audio preprocessing and normalization