metadata

title: MMS Translation API
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
suggested_hardware: t4-small

Translations API

A simple Flask API for translation services.

Getting Started

Development with Docker

Build and run the development container:

docker compose up translations

The API will be available at http://localhost:5001

Available Endpoints

GET / - Root endpoint with API information
GET /health - Health check endpoint
POST /transcribe - Audio transcription with forced alignment using MMS model
POST /align - Forced alignment for audio with provided transcription
GET /hello - Simple hello world endpoint
POST /echo - Echo back request data
GET /version - API version information

Environment Variables

API_LOG_LEVEL - Set logging level (DEBUG, INFO, WARNING, ERROR)

Testing

The API includes basic endpoints for testing:

# Health check
curl http://localhost:5001/health

# Hello world
curl http://localhost:5001/hello

# Echo test
curl -X POST http://localhost:5001/echo \
  -H "Content-Type: application/json" \
  -d '{"test": "data"}'

# Audio transcription (requires audio file)
curl -X POST http://localhost:5001/transcribe \
  -F "audio=@path/to/your/audio.wav"

# Forced alignment (requires audio file + transcription text)
curl -X POST http://localhost:5001/align \
  -F "audio=@path/to/your/audio.wav" \
  -F "transcription=Hello world this is a test"

Model Setup

Before running the API, you need to download the MMS model files:

# Create models directory
mkdir -p server/models

# Download MMS model checkpoint (7B parameters, ~14GB)
wget https://dl.fbaipublicfiles.com/mms/mms_XRI.pt -O server/models/mms_XRI.pt

# Download tokenizer model (~6MB)
wget https://dl.fbaipublicfiles.com/mms/mms_1143_langs_tokenizer_spm.model -O server/models/mms_1143_langs_tokenizer_spm.model

Note: The model files are large (especially mms_XRI.pt at ~14GB) and are excluded from git via .gitignore. Make sure you have sufficient disk space and a stable internet connection for the download.

Project Structure

translations/
├── Dockerfile
├── docker-compose.yaml
├── requirements.txt
├── README.md
└── server/
    ├── server.py                    # Main Flask application with model loading
    ├── model.py                     # MMS model implementation and inference
    ├── translations_blueprint.py   # API routes including transcription
    ├── env_vars.py                 # Environment configuration
    ├── run.sh                      # Production startup script
    ├── run_tests.sh               # Test runner script
    └── models/                     # Model files directory (gitignored)
        ├── .gitignore             # Ignores model files from git
        ├── mms_XRI.pt             # MMS model checkpoint (~14GB)
        └── mms_1143_langs_tokenizer_spm.model  # Tokenizer model (~6MB)

Key Components

MMS Model: Meta's Massively Multilingual Speech model for audio transcription
Forced Alignment: Timestamp alignment between transcription and audio
GPU Support: CUDA-enabled inference with NVIDIA Container Toolkit
Singleton Pattern: Model loaded once at startup to prevent GPU memory issues
Audio Processing: Librosa-based audio preprocessing and normalization