view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others β’ Jul 8 β’ 625
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper β’ 2501.17161 β’ Published Jan 28 β’ 123
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others β’ Jan 28 β’ 877
view article Article Synthetic Data Generation with FastData and Hugging Face By asoria β’ Jan 7 β’ 15
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling β’ 3 items β’ Updated Dec 19, 2024 β’ 149
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published Dec 18, 2024 β’ 153
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. β’ 39 items β’ Updated 24 days ago β’ 368