bird-of-paradise commited on
Commit
cd718f4
Β·
verified Β·
1 Parent(s): 91ca9b3

first commit

Browse files
Files changed (1) hide show
  1. README.md +35 -6
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
  title: Post Training Techniques Guide
3
  emoji: πŸš€
4
- colorFrom: red
5
- colorTo: red
6
  sdk: docker
7
  app_port: 8501
8
  tags:
@@ -12,9 +12,38 @@ short_description: A visual guide to post-training techniques for LLMs
12
  license: mit
13
  ---
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
1
  ---
2
  title: Post Training Techniques Guide
3
  emoji: πŸš€
4
+ colorFrom: purple
5
+ colorTo: yellow
6
  sdk: docker
7
  app_port: 8501
8
  tags:
 
12
  license: mit
13
  ---
14
 
15
+ # πŸ”§ Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs
16
 
17
+ This deck summarizes the key trade-offs between different post-training strategies for large language models β€” including:
18
+
19
+ - πŸ“š Supervised Fine-Tuning (SFT)
20
+ - 🀝 Preference Optimization (DPO, APO, GRPO)
21
+ - 🎯 Reinforcement Learning (PPO)
22
+
23
+ It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like **SmolLM3**, **Tulu 2/3**, and **DeepSeek-R1** implement these strategies.
24
+
25
+ > This is a companion resource to my ReTool rollout implementation and blog post.
26
+ >
27
+ > πŸ“– [Medium blog post](https://medium.com/@jenwei0312/beyond-generate-a-deep-dive-into-stateful-multi-turn-llm-rollouts-for-tool-use-336b00c99ac0)
28
+ > πŸ’» [ReTool Hugging Face Space](https://huggingface.co/spaces/bird-of-paradise/ReTool-Implementation)
29
+
30
+ ---
31
+
32
+ ### πŸ“Ž Download the Slides
33
+ πŸ‘‰ [PDF version](https://huggingface.co/spaces/bird-of-paradise/post-training-techniques-guide/blob/main/src/Post%20Training%20Techniques.pdf)
34
+
35
+ ---
36
+
37
+ ### 🀝 Reuse & Attribution
38
+
39
+ This deck is free to share in talks, posts, or documentation β€” **with attribution**.
40
+
41
+ Please credit:
42
+ **Jen Wei β€” [Hugging Face πŸ€—](https://huggingface.co/bird-of-paradise) | [X/Twitter](https://x.com/JenniferWe17599)**
43
+ Optional citation: *β€œBeyond Pretraining: Post-Training Techniques for LLMs (2025)”*
44
+
45
+ Licensed under MIT License.
46
+
47
+ β€”
48
+ *Made with 🧠 by Jen Wei, July 2025*
49