view article Article Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs 15 days ago • 25
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 25 days ago • 191
view article Article Memory-efficient Diffusion Transformers with Quanto and Diffusers Jul 30, 2024 • 66
view article Article NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets Mar 18 • 38
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 412
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! Mar 7 • 55
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models Paper • 2310.08659 • Published Oct 12, 2023 • 28
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28, 2024 • 12
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 102