metadata
title: Post Training Techniques Guide
emoji: π
colorFrom: purple
colorTo: yellow
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: A visual guide to post-training techniques for LLMs
license: mit
π§ Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs
This deck summarizes the key trade-offs between different post-training strategies for large language models β including:
- π Supervised Fine-Tuning (SFT)
- π€ Preference Optimization (DPO, APO, GRPO)
- π― Reinforcement Learning (PPO)
It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like SmolLM3, Tulu 2/3, and DeepSeek-R1 implement these strategies.
This is a companion resource to my ReTool rollout implementation and blog post.
π Download the Slides
π PDF version
π€ Reuse & Attribution
This deck is free to share in talks, posts, or documentation β with attribution.
Please credit:
Jen Wei β Hugging Face π€ | X/Twitter
Optional citation: βBeyond Pretraining: Post-Training Techniques for LLMs (2025)β
Licensed under MIT License.
β Made with π§ by Jen Wei, July 2025