first commit
Browse files
README.md
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
---
|
2 |
title: Post Training Techniques Guide
|
3 |
emoji: π
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: docker
|
7 |
app_port: 8501
|
8 |
tags:
|
@@ -12,9 +12,38 @@ short_description: A visual guide to post-training techniques for LLMs
|
|
12 |
license: mit
|
13 |
---
|
14 |
|
15 |
-
#
|
16 |
|
17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
-
If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
|
20 |
-
forums](https://discuss.streamlit.io).
|
|
|
1 |
---
|
2 |
title: Post Training Techniques Guide
|
3 |
emoji: π
|
4 |
+
colorFrom: purple
|
5 |
+
colorTo: yellow
|
6 |
sdk: docker
|
7 |
app_port: 8501
|
8 |
tags:
|
|
|
12 |
license: mit
|
13 |
---
|
14 |
|
15 |
+
# π§ Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs
|
16 |
|
17 |
+
This deck summarizes the key trade-offs between different post-training strategies for large language models β including:
|
18 |
+
|
19 |
+
- π Supervised Fine-Tuning (SFT)
|
20 |
+
- π€ Preference Optimization (DPO, APO, GRPO)
|
21 |
+
- π― Reinforcement Learning (PPO)
|
22 |
+
|
23 |
+
It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like **SmolLM3**, **Tulu 2/3**, and **DeepSeek-R1** implement these strategies.
|
24 |
+
|
25 |
+
> This is a companion resource to my ReTool rollout implementation and blog post.
|
26 |
+
>
|
27 |
+
> π [Medium blog post](https://medium.com/@jenwei0312/beyond-generate-a-deep-dive-into-stateful-multi-turn-llm-rollouts-for-tool-use-336b00c99ac0)
|
28 |
+
> π» [ReTool Hugging Face Space](https://huggingface.co/spaces/bird-of-paradise/ReTool-Implementation)
|
29 |
+
|
30 |
+
---
|
31 |
+
|
32 |
+
### π Download the Slides
|
33 |
+
π [PDF version](https://huggingface.co/spaces/bird-of-paradise/post-training-techniques-guide/blob/main/src/Post%20Training%20Techniques.pdf)
|
34 |
+
|
35 |
+
---
|
36 |
+
|
37 |
+
### π€ Reuse & Attribution
|
38 |
+
|
39 |
+
This deck is free to share in talks, posts, or documentation β **with attribution**.
|
40 |
+
|
41 |
+
Please credit:
|
42 |
+
**Jen Wei β [Hugging Face π€](https://huggingface.co/bird-of-paradise) | [X/Twitter](https://x.com/JenniferWe17599)**
|
43 |
+
Optional citation: *βBeyond Pretraining: Post-Training Techniques for LLMs (2025)β*
|
44 |
+
|
45 |
+
Licensed under MIT License.
|
46 |
+
|
47 |
+
β
|
48 |
+
*Made with π§ by Jen Wei, July 2025*
|
49 |
|
|
|
|