File size: 1,705 Bytes
a0e1f28 4772aa1 a0e1f28 cd718f4 a0e1f28 4772aa1 a0e1f28 4772aa1 a0e1f28 cd718f4 a0e1f28 cd718f4 a0e1f28 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
title: Post Training Techniques Guide
emoji: π
colorFrom: purple
colorTo: yellow
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: A visual guide to post-training techniques for LLMs
license: mit
---
# π§ Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs
This deck summarizes the key trade-offs between different post-training strategies for large language models β including:
- π Supervised Fine-Tuning (SFT)
- π€ Preference Optimization (DPO, APO, GRPO)
- π― Reinforcement Learning (PPO)
It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like **SmolLM3**, **Tulu 2/3**, and **DeepSeek-R1** implement these strategies.
> This is a companion resource to my ReTool rollout implementation and blog post.
>
> π [Medium blog post](https://medium.com/@jenwei0312/beyond-generate-a-deep-dive-into-stateful-multi-turn-llm-rollouts-for-tool-use-336b00c99ac0)
> π» [ReTool Hugging Face Space](https://huggingface.co/spaces/bird-of-paradise/ReTool-Implementation)
---
### π Download the Slides
π [PDF version](https://huggingface.co/spaces/bird-of-paradise/post-training-techniques-guide/blob/main/src/Post%20Training%20Techniques.pdf)
---
### π€ Reuse & Attribution
This deck is free to share in talks, posts, or documentation β **with attribution**.
Please credit:
**Jen Wei β [Hugging Face π€](https://huggingface.co/bird-of-paradise) | [X/Twitter](https://x.com/JenniferWe17599)**
Optional citation: *βBeyond Pretraining: Post-Training Techniques for LLMs (2025)β*
Licensed under MIT License.
β
*Made with π§ by Jen Wei, July 2025*
|