AI & ML interests

LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV

Recent Activity

neuralmagic 's collections 14

Sparse-Llama-3.1-2of4
2:4 sparse versions of Llama-3.1, including transfer learning
FP8 LLMs for vLLM
Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!
INT4 LLMs for vLLM
Accurate INT4 quantized models by Neural Magic, ready for use with vLLM!
Compression Papers
Papers that we're proud to integrate into our libraries
Sparse Finetuning MPT
Explore our breakthrough in sparse fine-tuning LLMs! Our novel method maintains downstream accuracy even with >70% sparsity.
Vision Language Models Quantization
Vision Language Models (VLMs) quantized by Neural Magic
Llama-3.2 Quantization
Llama 3.2 models quantized by Neural Magic
INT8 LLMs for vLLM
Accurate INT8 quantized models by Neural Magic, ready for use with vLLM!
Sparse Foundational Llama 2 Models
Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras
DeepSparse Sparse LLMs
Useful LLMs for DeepSparse where we've pruned at least 50% of the weights!
Sparse-Llama-3.1-2of4
2:4 sparse versions of Llama-3.1, including transfer learning
Vision Language Models Quantization
Vision Language Models (VLMs) quantized by Neural Magic
FP8 LLMs for vLLM
Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!
Llama-3.2 Quantization
Llama 3.2 models quantized by Neural Magic
INT8 LLMs for vLLM
Accurate INT8 quantized models by Neural Magic, ready for use with vLLM!
INT4 LLMs for vLLM
Accurate INT4 quantized models by Neural Magic, ready for use with vLLM!
Sparse Foundational Llama 2 Models
Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras
Compression Papers
Papers that we're proud to integrate into our libraries
DeepSparse Sparse LLMs
Useful LLMs for DeepSparse where we've pruned at least 50% of the weights!
Sparse Finetuning MPT
Explore our breakthrough in sparse fine-tuning LLMs! Our novel method maintains downstream accuracy even with >70% sparsity.