view article Article Decoding Strategies in Large Language Models By mlabonne • Oct 29, 2024 • 76
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • Jan 30 • 108
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 336
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7, 2024 • 45
view article Article SmolLM - blazingly fast and remarkably powerful By loubnabnl and 2 others • Jul 16, 2024 • 405