|
# WhisperLive-TensorRT |
|
We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup. |
|
**Note**: We use `tensorrt_llm==0.18.2` |
|
|
|
## Installation |
|
- Install [docker](https://docs.docker.com/engine/install/) |
|
- Install [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) |
|
|
|
- Run WhisperLive TensorRT in docker |
|
```bash |
|
docker build . -f docker/Dockerfile.tensorrt -t whisperlive-tensorrt |
|
docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it whisperlive-tensorrt |
|
``` |
|
|
|
## Whisper TensorRT Engine |
|
- We build `small.en` and `small` multilingual TensorRT engine as examples below. The script logs the path of the directory with Whisper TensorRT engine. We need that model_path to run the server. |
|
```bash |
|
# convert small.en |
|
bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en # float16 |
|
bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en int8 # int8 weight only quantization |
|
bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en int4 # int4 weight only quantization |
|
|
|
# convert small multilingual model |
|
bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small |
|
``` |
|
|
|
## Run WhisperLive Server with TensorRT Backend |
|
```bash |
|
# Run English only model |
|
python3 run_server.py --port 9090 \ |
|
--backend tensorrt \ |
|
--trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_en_float16" |
|
|
|
# Run Multilingual model |
|
python3 run_server.py --port 9090 \ |
|
--backend tensorrt \ |
|
--trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_float16" \ |
|
--trt_multilingual |
|
``` |
|
|
|
By default trt_backend uses cpp_session, to use python session pass `--trt_py_session` to run_server.py |
|
```bash |
|
python3 run_server.py --port 9090 \ |
|
--backend tensorrt \ |
|
--trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_float16" \ |
|
--trt_py_session |
|
``` |