Spaces:
Running
Running
Update Organisation Card
Browse files
README.md
CHANGED
|
@@ -1,10 +1,29 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Optimum-Nvidia - TensorRT-LLM optimized inference engines
|
| 3 |
+
emoji: π
|
| 4 |
+
colorFrom: green
|
| 5 |
+
colorTo: yellow
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
[Optimum-Nvidia](https://github.com/huggingface/optimum-nvidia) allows you to easily leverages Nvidia's [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) Inference tool
|
| 11 |
+
through a seemlessly integration following huggingface/transformers API.
|
| 12 |
+
|
| 13 |
+
This organisation holds prebuilt TensorRT-LLM compatible engines for various fondational models one can use, fork and deploy to get started as fast as possible and benefits from
|
| 14 |
+
out of the box peak performances on Nvidia hardware.
|
| 15 |
+
|
| 16 |
+
Prebuilt engines will attempt (as much as possible) to be build with the best options available and will push updated models following additions to TensorRT-LLM repository.
|
| 17 |
+
This can include (not limited to):
|
| 18 |
+
- Leveraging `float8` quantization on supported hardware (H100/L4/L40/RTX 40xx)
|
| 19 |
+
- Enabling `float8` or `int8` KV cache
|
| 20 |
+
- Enabling in-flight batching for dynamic batching when used in combinaison with Nvidia Triton Inference Server
|
| 21 |
+
- Enabling xQA attention kernels
|
| 22 |
+
|
| 23 |
+
Current engines are targetting the following Nvidia TensorCore GPUs and can be found using specific branch matching the targetted GPU in the repo:
|
| 24 |
+
|
| 25 |
+
- [4090 (sm_89)](https://huggingface.co/collections/optimum-nvidia/rtx-4090-optimized-tensorrt-llm-models-65e5ebc1240c11001a3e666b)
|
| 26 |
+
|
| 27 |
+
Feel free to open-up discussions and ask for models to support through the community tab
|
| 28 |
+
|
| 29 |
+
- The Optimum-Nvidia team at π€
|