metadata

title: Post Training Techniques Guide
emoji: 🚀
colorFrom: purple
colorTo: yellow
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: A visual guide to post-training techniques for LLMs
license: mit

🔧 Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs

This deck summarizes the key trade-offs between different post-training strategies for large language models — including:

📚 Supervised Fine-Tuning (SFT)
🤝 Preference Optimization (DPO, APO, GRPO)
🎯 Reinforcement Learning (PPO)

It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like SmolLM3, Tulu 2/3, and DeepSeek-R1 implement these strategies.

This is a companion resource to my ReTool rollout implementation and blog post.

📖 Medium blog post
💻 ReTool Hugging Face Space

📎 Download the Slides

👉 PDF version

🤝 Reuse & Attribution

This deck is free to share in talks, posts, or documentation — with attribution.

Please credit: Jen Wei — Hugging Face 🤗 | X/Twitter
Optional citation: “Beyond Pretraining: Post-Training Techniques for LLMs (2025)”

Licensed under MIT License.

— Made with 🧠 by Jen Wei, July 2025