Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
@@ -35,19 +35,32 @@ resource constraints. To train models that are robust to truncated thinking, we
|
|
35 |
introduce a lightweight `budget-constrained rollout` strategy, integrated into GRPO,
|
36 |
which teaches the model to reason adaptively when the thinking process is cut
|
37 |
short and generalizes effectively to unseen budget constraints without additional
|
38 |
-
training.
|
|
|
|
|
39 |
<p align="center">
|
40 |
<img src="figs/framework.png" width="80%" />
|
41 |
</p>
|
42 |
-
|
43 |
-
|
|
|
44 |
**Main Takeaways**
|
45 |
1. βοΈ Thinking + Solution are explicitly separated with independent budgets β boosting reliability under tight compute constraints.
|
46 |
2. π§ Budget-Constrained Rollout: We train models to handle truncated reasoning using GRPO.
|
47 |
3. π Flexible scalability: Robust performance across diverse inference budgets on reasoning benchmarks like AIME and LiveCodeBench.
|
48 |
4. βοΈ Better performance with fewer tokens: Our trained model generates outputs that are 30% shorter while maintaining (or even improving) accuracy.
|
49 |
|
|
|
|
|
|
|
|
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
## Citation
|
52 |
|
53 |
|
|
|
35 |
introduce a lightweight `budget-constrained rollout` strategy, integrated into GRPO,
|
36 |
which teaches the model to reason adaptively when the thinking process is cut
|
37 |
short and generalizes effectively to unseen budget constraints without additional
|
38 |
+
training.
|
39 |
+
""")
|
40 |
+
gr.HTML("""
|
41 |
<p align="center">
|
42 |
<img src="figs/framework.png" width="80%" />
|
43 |
</p>
|
44 |
+
""")
|
45 |
+
gr.Markdown(
|
46 |
+
"""
|
47 |
**Main Takeaways**
|
48 |
1. βοΈ Thinking + Solution are explicitly separated with independent budgets β boosting reliability under tight compute constraints.
|
49 |
2. π§ Budget-Constrained Rollout: We train models to handle truncated reasoning using GRPO.
|
50 |
3. π Flexible scalability: Robust performance across diverse inference budgets on reasoning benchmarks like AIME and LiveCodeBench.
|
51 |
4. βοΈ Better performance with fewer tokens: Our trained model generates outputs that are 30% shorter while maintaining (or even improving) accuracy.
|
52 |
|
53 |
+
<p align="center">
|
54 |
+
<img src="figs/aime.png" width="46%" />
|
55 |
+
<img src="figs/livecode.png" width="48%" />
|
56 |
+
</p>
|
57 |
|
58 |
+
<p align="center">
|
59 |
+
<img src="figs/codetable.png" width="90%" />
|
60 |
+
</p>
|
61 |
+
""")
|
62 |
+
gr.Markdown(
|
63 |
+
"""
|
64 |
## Citation
|
65 |
|
66 |
|