Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	File size: 1,370 Bytes
			
			| c3d28e6 0f34e54 c3d28e6 0f34e54 c3d28e6 26ba037 5a9070c f9ac629 1102b21 f9ac629 c3ddd3d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ---
title: README
emoji: 💻
colorFrom: purple
colorTo: blue
sdk: static
pinned: false
---
# Software-Delivered AI Inference
Neural Magic helps developers in accelerating deep learning performance using automated model compression technologies and inference engines.
Download our compression-aware inference engines and open source tools for fast model inference. 
* [nm-vllm](https://neuralmagic.com/nm-vllm/): A high-throughput and memory-efficient inference engine for LLMs, our supported enterprise distribution of [vLLM](https://github.com/vllm-project/vllm).
* [DeepSparse](https://github.com/neuralmagic/deepsparse): Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application
* [llm-compressor](https://github.com/vllm-project/llm-compressor/): Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor. | 
