view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 204
view article Article From GRPO to DAPO and GSPO: What, Why, and How By NormalUhr • 5 days ago • 8
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models Paper • 2410.17578 • Published Oct 23, 2024 • 1
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation Paper • 2412.10424 • Published Dec 10, 2024 • 2
Running 3.04k 3.04k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
EXAONE-3.5 Collection EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B • 11 items • Updated Jul 7 • 119
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 49
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 49