|
--- |
|
title: Post Training Techniques Guide |
|
emoji: π |
|
colorFrom: purple |
|
colorTo: yellow |
|
sdk: docker |
|
app_port: 8501 |
|
tags: |
|
- streamlit |
|
pinned: false |
|
short_description: A visual guide to post-training techniques for LLMs |
|
license: mit |
|
--- |
|
|
|
# π§ Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs |
|
|
|
This deck summarizes the key trade-offs between different post-training strategies for large language models β including: |
|
|
|
- π Supervised Fine-Tuning (SFT) |
|
- π€ Preference Optimization (DPO, APO, GRPO) |
|
- π― Reinforcement Learning (PPO) |
|
|
|
It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like **SmolLM3**, **Tulu 2/3**, and **DeepSeek-R1** implement these strategies. |
|
|
|
> This is a companion resource to my ReTool rollout implementation and blog post. |
|
> |
|
> π [Medium blog post](https://medium.com/@jenwei0312/beyond-generate-a-deep-dive-into-stateful-multi-turn-llm-rollouts-for-tool-use-336b00c99ac0) |
|
> π» [ReTool Hugging Face Space](https://huggingface.co/spaces/bird-of-paradise/ReTool-Implementation) |
|
|
|
--- |
|
|
|
### π Download the Slides |
|
π [PDF version](https://huggingface.co/spaces/bird-of-paradise/post-training-techniques-guide/blob/main/src/Post%20Training%20Techniques.pdf) |
|
|
|
--- |
|
|
|
### π€ Reuse & Attribution |
|
|
|
This deck is free to share in talks, posts, or documentation β **with attribution**. |
|
|
|
Please credit: |
|
**Jen Wei β [Hugging Face π€](https://huggingface.co/bird-of-paradise) | [X/Twitter](https://x.com/JenniferWe17599)** |
|
Optional citation: *βBeyond Pretraining: Post-Training Techniques for LLMs (2025)β* |
|
|
|
Licensed under MIT License. |
|
|
|
β |
|
*Made with π§ by Jen Wei, July 2025* |
|
|
|
|