File size: 1,705 Bytes
a0e1f28
4772aa1
a0e1f28
cd718f4
 
a0e1f28
 
 
4772aa1
a0e1f28
4772aa1
 
a0e1f28
 
cd718f4
a0e1f28
cd718f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0e1f28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
title: Post Training Techniques Guide
emoji: πŸš€
colorFrom: purple
colorTo: yellow
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: A visual guide to post-training techniques for LLMs
license: mit
---

# πŸ”§ Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs

This deck summarizes the key trade-offs between different post-training strategies for large language models β€” including:

- πŸ“š Supervised Fine-Tuning (SFT)
- 🀝 Preference Optimization (DPO, APO, GRPO)
- 🎯 Reinforcement Learning (PPO)

It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like **SmolLM3**, **Tulu 2/3**, and **DeepSeek-R1** implement these strategies.

> This is a companion resource to my ReTool rollout implementation and blog post.
>  
> πŸ“– [Medium blog post](https://medium.com/@jenwei0312/beyond-generate-a-deep-dive-into-stateful-multi-turn-llm-rollouts-for-tool-use-336b00c99ac0)  
> πŸ’» [ReTool Hugging Face Space](https://huggingface.co/spaces/bird-of-paradise/ReTool-Implementation)  

---

### πŸ“Ž Download the Slides
πŸ‘‰ [PDF version](https://huggingface.co/spaces/bird-of-paradise/post-training-techniques-guide/blob/main/src/Post%20Training%20Techniques.pdf)

---

### 🀝 Reuse & Attribution

This deck is free to share in talks, posts, or documentation β€” **with attribution**.

Please credit:
**Jen Wei β€” [Hugging Face πŸ€—](https://huggingface.co/bird-of-paradise) | [X/Twitter](https://x.com/JenniferWe17599)**  
Optional citation: *β€œBeyond Pretraining: Post-Training Techniques for LLMs (2025)”*

Licensed under MIT License.

β€”
*Made with 🧠 by Jen Wei, July 2025*