view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • May 15 • 114
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 305
view article Article Welcome Llama 4 Maverick & Scout on Hugging Face! By burtenshaw and 6 others • Apr 5 • 145
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! By medmekk and 1 other • Mar 7 • 65
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.26k
Running 2.73k 2.73k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Running 2.73k 2.73k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 868
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping Paper • 2409.15241 • Published Sep 23, 2024 • 1