mms-transcription / README.md
EC2 Default User
Added basic frontend, dockerfile
0f60365
metadata
title: MMS Translation API
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
suggested_hardware: t4-small

Translations API

A simple Flask API for translation services.

Getting Started

Development with Docker

  1. Build and run the development container:
docker compose up translations

The API will be available at http://localhost:5001

Available Endpoints

  • GET / - Root endpoint with API information
  • GET /health - Health check endpoint
  • POST /transcribe - Audio transcription with forced alignment using MMS model
  • POST /align - Forced alignment for audio with provided transcription
  • GET /hello - Simple hello world endpoint
  • POST /echo - Echo back request data
  • GET /version - API version information

Environment Variables

  • API_LOG_LEVEL - Set logging level (DEBUG, INFO, WARNING, ERROR)

Testing

The API includes basic endpoints for testing:

# Health check
curl http://localhost:5001/health

# Hello world
curl http://localhost:5001/hello

# Echo test
curl -X POST http://localhost:5001/echo \
  -H "Content-Type: application/json" \
  -d '{"test": "data"}'

# Audio transcription (requires audio file)
curl -X POST http://localhost:5001/transcribe \
  -F "audio=@path/to/your/audio.wav"

# Forced alignment (requires audio file + transcription text)
curl -X POST http://localhost:5001/align \
  -F "audio=@path/to/your/audio.wav" \
  -F "transcription=Hello world this is a test"

Model Setup

Before running the API, you need to download the MMS model files:

# Create models directory
mkdir -p server/models

# Download MMS model checkpoint (7B parameters, ~14GB)
wget https://dl.fbaipublicfiles.com/mms/mms_XRI.pt -O server/models/mms_XRI.pt

# Download tokenizer model (~6MB)
wget https://dl.fbaipublicfiles.com/mms/mms_1143_langs_tokenizer_spm.model -O server/models/mms_1143_langs_tokenizer_spm.model

Note: The model files are large (especially mms_XRI.pt at ~14GB) and are excluded from git via .gitignore. Make sure you have sufficient disk space and a stable internet connection for the download.

Project Structure

translations/
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yaml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── server/
    β”œβ”€β”€ server.py                    # Main Flask application with model loading
    β”œβ”€β”€ model.py                     # MMS model implementation and inference
    β”œβ”€β”€ translations_blueprint.py   # API routes including transcription
    β”œβ”€β”€ env_vars.py                 # Environment configuration
    β”œβ”€β”€ run.sh                      # Production startup script
    β”œβ”€β”€ run_tests.sh               # Test runner script
    └── models/                     # Model files directory (gitignored)
        β”œβ”€β”€ .gitignore             # Ignores model files from git
        β”œβ”€β”€ mms_XRI.pt             # MMS model checkpoint (~14GB)
        └── mms_1143_langs_tokenizer_spm.model  # Tokenizer model (~6MB)

Key Components

  • MMS Model: Meta's Massively Multilingual Speech model for audio transcription
  • Forced Alignment: Timestamp alignment between transcription and audio
  • GPU Support: CUDA-enabled inference with NVIDIA Container Toolkit
  • Singleton Pattern: Model loaded once at startup to prevent GPU memory issues
  • Audio Processing: Librosa-based audio preprocessing and normalization