bird-of-paradise's picture
first commit
cd718f4 verified
metadata
title: Post Training Techniques Guide
emoji: πŸš€
colorFrom: purple
colorTo: yellow
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: A visual guide to post-training techniques for LLMs
license: mit

πŸ”§ Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs

This deck summarizes the key trade-offs between different post-training strategies for large language models β€” including:

  • πŸ“š Supervised Fine-Tuning (SFT)
  • 🀝 Preference Optimization (DPO, APO, GRPO)
  • 🎯 Reinforcement Learning (PPO)

It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like SmolLM3, Tulu 2/3, and DeepSeek-R1 implement these strategies.

This is a companion resource to my ReTool rollout implementation and blog post.

πŸ“– Medium blog post
πŸ’» ReTool Hugging Face Space


πŸ“Ž Download the Slides

πŸ‘‰ PDF version


🀝 Reuse & Attribution

This deck is free to share in talks, posts, or documentation β€” with attribution.

Please credit: Jen Wei β€” Hugging Face πŸ€— | X/Twitter
Optional citation: β€œBeyond Pretraining: Post-Training Techniques for LLMs (2025)”

Licensed under MIT License.

β€” Made with 🧠 by Jen Wei, July 2025