File size: 1,683 Bytes
6bd1fef adf9a8a 6bd1fef 89bef1d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
---
pipeline_tag: text-to-speech
library_name: transformers
---
# Optimized MMS-TTS-ENG with ONNX Runtime
This repository contains an optimized version of the `facebook/mms-tts-eng` Text-to-Speech model for fast CPU inference using ONNX Runtime and dynamic quantization. It demonstrates how to convert the model to ONNX, quantize it, and run inference efficiently. It also includes an example of uploading the converted model and tokenizer to the Hugging Face Hub.
## Features
* **ONNX Conversion:** Converts the `facebook/mms-tts-eng` PyTorch model to ONNX format for optimized inference.
* **Dynamic Quantization:** Applies dynamic quantization (float32 to int8) to reduce model size and improve CPU inference speed.
* **Fast CPU Inference:** Leverages ONNX Runtime for efficient CPU-based speech generation.
* **Google Colab Compatible:** Provides complete, runnable code examples for Google Colab.
* **Hugging Face Hub Integration:** Includes code to upload the converted model and tokenizer to the Hugging Face Hub for easy sharing and deployment.
* **Seeded Generation:** Includes an example of seeded generation for reproducible (though still non-deterministic across different seeds) outputs.
* **Speed Comparison:** Demonstrates how to compare the inference speed of the ONNX Runtime optimized model with the original PyTorch model (with `torch.compile`).
## Requirements
* Python 3.7+
* `transformers`
* `accelerate`
* `scipy`
* `onnxruntime`
* `optimum`
* `onnx`
* `huggingface_hub`
You can install the required packages using pip:
```bash
pip install --upgrade transformers accelerate scipy onnxruntime optimum onnx huggingface_hub |