yuhuixu commited on
Commit
6fe96dd
Β·
verified Β·
1 Parent(s): 2a0a8b1

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +16 -3
app.py CHANGED
@@ -35,19 +35,32 @@ resource constraints. To train models that are robust to truncated thinking, we
35
  introduce a lightweight `budget-constrained rollout` strategy, integrated into GRPO,
36
  which teaches the model to reason adaptively when the thinking process is cut
37
  short and generalizes effectively to unseen budget constraints without additional
38
- training.
 
 
39
  <p align="center">
40
  <img src="figs/framework.png" width="80%" />
41
  </p>
42
-
43
-
 
44
  **Main Takeaways**
45
  1. βœ‚οΈ Thinking + Solution are explicitly separated with independent budgets β€” boosting reliability under tight compute constraints.
46
  2. 🧠 Budget-Constrained Rollout: We train models to handle truncated reasoning using GRPO.
47
  3. πŸ“ˆ Flexible scalability: Robust performance across diverse inference budgets on reasoning benchmarks like AIME and LiveCodeBench.
48
  4. βš™οΈ Better performance with fewer tokens: Our trained model generates outputs that are 30% shorter while maintaining (or even improving) accuracy.
49
 
 
 
 
 
50
 
 
 
 
 
 
 
51
  ## Citation
52
 
53
 
 
35
  introduce a lightweight `budget-constrained rollout` strategy, integrated into GRPO,
36
  which teaches the model to reason adaptively when the thinking process is cut
37
  short and generalizes effectively to unseen budget constraints without additional
38
+ training.
39
+ """)
40
+ gr.HTML("""
41
  <p align="center">
42
  <img src="figs/framework.png" width="80%" />
43
  </p>
44
+ """)
45
+ gr.Markdown(
46
+ """
47
  **Main Takeaways**
48
  1. βœ‚οΈ Thinking + Solution are explicitly separated with independent budgets β€” boosting reliability under tight compute constraints.
49
  2. 🧠 Budget-Constrained Rollout: We train models to handle truncated reasoning using GRPO.
50
  3. πŸ“ˆ Flexible scalability: Robust performance across diverse inference budgets on reasoning benchmarks like AIME and LiveCodeBench.
51
  4. βš™οΈ Better performance with fewer tokens: Our trained model generates outputs that are 30% shorter while maintaining (or even improving) accuracy.
52
 
53
+ <p align="center">
54
+ <img src="figs/aime.png" width="46%" />
55
+ <img src="figs/livecode.png" width="48%" />
56
+ </p>
57
 
58
+ <p align="center">
59
+ <img src="figs/codetable.png" width="90%" />
60
+ </p>
61
+ """)
62
+ gr.Markdown(
63
+ """
64
  ## Citation
65
 
66