File size: 1,096 Bytes
a0e1f28
 
 
 
 
91ca9b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46852bf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import altair as alt
import numpy as np
import pandas as pd
import streamlit as st

import streamlit as st

st.set_page_config(page_title="Post-Training Techniques for LLMs", layout="centered")

st.title("πŸ”§ Beyond Pretraining: Post-Training Techniques for LLMs")
st.subheader("Distillation, Preference Optimization, and RLHF β€” Visualized")

st.markdown("""
This Streamlit app hosts a visual guide to help navigate post-training strategies for language models, with real-world examples like **SmolLM3**, **Tulu**, and **DeepSeek-R1**.

πŸ“Ž Download the full slide deck:
πŸ‘‰ [Click here to download (PDF)](https://huggingface.co/spaces/bird-of-paradise/post-training-techniques-guide/blob/main/src/Post%20Training%20Techniques.pdf)

---

🧠 **Topics covered:**
- Tradeoffs between SFT, DPO/APO/GRPO, PPO
- Subjective vs Rule-based rewards
- How real open-source models chose their strategy

Made with ❀️ by Jen Wei
""")

# Optional: Slide preview
st.image("src/Post_Training_Techniques_preview_2.png", caption="Slide 1: Tradeoffs between Optimization Paths", use_container_width=True)