LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published 14 days ago • 73
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 73
MMLU Pro benchmark for GGUFs (1 shot) Collection "Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX • 13 items • Updated Aug 15 • 9
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Paper • 2507.07136 • Published Jul 9 • 38
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 Sep 18, 2024 • 272
view article Article Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs +7 Apr 29 • 43
view article Article Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance May 21 • 38