Spaces:

bird-of-paradise
/

post-training-techniques-guide

Running

App Files Files Community

post-training-techniques-guide / README.md

bird-of-paradise

first commit

cd718f4 verified 18 days ago

preview code

raw

history blame contribute delete

1.71 kB

	---
	title: Post Training Techniques Guide
	emoji: 🚀
	colorFrom: purple
	colorTo: yellow
	sdk: docker
	app_port: 8501
	tags:
	- streamlit
	pinned: false
	short_description: A visual guide to post-training techniques for LLMs
	license: mit
	---

	# 🔧 Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs

	This deck summarizes the key trade-offs between different post-training strategies for large language models — including:

	- 📚 Supervised Fine-Tuning (SFT)
	- 🤝 Preference Optimization (DPO, APO, GRPO)
	- 🎯 Reinforcement Learning (PPO)

	It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like SmolLM3, Tulu 2/3, and DeepSeek-R1 implement these strategies.

	> This is a companion resource to my ReTool rollout implementation and blog post.
	>
	> 📖 [Medium blog post](https://medium.com/@jenwei0312/beyond-generate-a-deep-dive-into-stateful-multi-turn-llm-rollouts-for-tool-use-336b00c99ac0)
	> 💻 [ReTool Hugging Face Space](https://huggingface.co/spaces/bird-of-paradise/ReTool-Implementation)

	---

	### 📎 Download the Slides
	👉 [PDF version](https://huggingface.co/spaces/bird-of-paradise/post-training-techniques-guide/blob/main/src/Post%20Training%20Techniques.pdf)

	---

	### 🤝 Reuse & Attribution

	This deck is free to share in talks, posts, or documentation — with attribution.

	Please credit:
	Jen Wei — [Hugging Face 🤗](https://huggingface.co/bird-of-paradise) \| [X/Twitter](https://x.com/JenniferWe17599)
	Optional citation: “Beyond Pretraining: Post-Training Techniques for LLMs (2025)”

	Licensed under MIT License.

	—
	Made with 🧠 by Jen Wei, July 2025