view article Article Decoding Strategies in Large Language Models By mlabonne • Oct 29, 2024 • 76
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • Jan 30 • 109
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 338
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7, 2024 • 45
view article Article SmolLM - blazingly fast and remarkably powerful By loubnabnl and 2 others • Jul 16, 2024 • 406
view article Article Agentic Task Delegation - Making Agents whole again By adarshxs • Aug 5, 2024 • 6
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 243