view article Article Activation Steering: A New Frontier in AI ControlβBut Does It Scale? By royswastik β’ Feb 2 β’ 3
view article Article Gemma 3n fully available in the open-source ecosystem! By ariG23498 and 7 others β’ Jun 26 β’ 114
view article Article StackLLaMA: A hands-on guide to train LLaMA with RLHF By edbeeching and 6 others β’ Apr 5, 2023 β’ 42
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr β’ Feb 7 β’ 201
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper β’ 2505.00551 β’ Published May 1 β’ 37
LLMs for Engineering: Teaching Models to Design High Powered Rockets Paper β’ 2504.19394 β’ Published Apr 27 β’ 14
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Paper β’ 2504.21659 β’ Published Apr 30 β’ 13
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper β’ 2504.20752 β’ Published Apr 29 β’ 93