yuhuixu commited on
Commit
92ef534
·
verified ·
1 Parent(s): c51f254

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +127 -2
app.py CHANGED
@@ -4,8 +4,133 @@ import gradio as gr
4
  with gr.Blocks() as demo:
5
  gr.Markdown(
6
  """
7
- # Hello World!
8
- Start typing below to see the output.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  """)
10
 
11
 
 
4
  with gr.Blocks() as demo:
5
  gr.Markdown(
6
  """
7
+ <div align="center">
8
+
9
+ # Elastic Reasoning
10
+ <div>
11
+ <div>
12
+ <h3>🚀 Scalable Chain of Thoughts via Elastic Reasoning 🌟
13
+ </div>
14
+ </div>
15
+ <br>
16
+ <div align="center">
17
+
18
+ [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/pdf/2505.05315)
19
+ [![Hugging Face Collection](https://img.shields.io/badge/E1-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/collections/Salesforce/elastic-reasoning-682b4bba108d6ea0a8bab275)
20
+ [![Github](https://img.shields.io/badge/Elastic_Reasoning-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/SalesforceAIResearch/Elastic-Reasoning)
21
+
22
+ </div>
23
+ </div>
24
+
25
+
26
+ ## Table of Contents
27
+ - [Introduction](#introduction)
28
+ - [Environment Setup](#environment-setup)
29
+ - [Training](#training)
30
+ - [Evaluation](#evaluation)
31
+
32
+ ## Introduction
33
+ We propose **Elastic Reasoning**, a novel framework for scalable chain of thoughts
34
+ that explicitly separates reasoning into two phases—`thinking and solution`—with
35
+ independently allocated budgets. At test time, Elastic Reasoning prioritize that
36
+ completeness of solution segments, significantly improving reliability under tight
37
+ resource constraints. To train models that are robust to truncated thinking, we
38
+ introduce a lightweight `budget-constrained rollout` strategy, integrated into GRPO,
39
+ which teaches the model to reason adaptively when the thinking process is cut
40
+ short and generalizes effectively to unseen budget constraints without additional
41
+ training.
42
+ <p align="center">
43
+ <img src="figs/framework.png" width="80%" />
44
+ </p>
45
+
46
+
47
+ **Main Takeaways**
48
+ 1. ✂️ Thinking + Solution are explicitly separated with independent budgets — boosting reliability under tight compute constraints.
49
+ 2. 🧠 Budget-Constrained Rollout: We train models to handle truncated reasoning using GRPO.
50
+ 3. 📈 Flexible scalability: Robust performance across diverse inference budgets on reasoning benchmarks like AIME and LiveCodeBench.
51
+ 4. ⚙️ Better performance with fewer tokens: Our trained model generates outputs that are 30% shorter while maintaining (or even improving) accuracy.
52
+
53
+ <p align="center">
54
+ <img src="figs/aime.png" width="46%" />
55
+ <img src="figs/livecode.png" width="48%" />
56
+ </p>
57
+
58
+ <p align="center">
59
+ <img src="figs/codetable.png" width="90%" />
60
+ </p>
61
+
62
+ ## Environment Setup
63
+
64
+
65
+ ### Installation
66
+ ```bash
67
+ # Installing Python 3.10 Environment.
68
+ conda create -n e1 python=3.10 -y
69
+ conda activate e1
70
+
71
+ # Installing dependencies.
72
+ cd Elastic-Reasoning
73
+ pip install -e ./verl
74
+ pip install -e .
75
+ ```
76
+ ### Data
77
+ Our raw training data is in `rllm/data/[train|test]/[code|math]/`, along with preprocessing scripts in `rllm/data/preprocess`. To convert the raw data into Parquet files for training, run:
78
+
79
+ ```bash
80
+ # Download datasets from GDrive, populates rllm/data/[train|test]/[math|code]/*.json
81
+ python scripts/data/download_datasets.py
82
+
83
+ # Generate parquet files for Deepcoder/DeepscaleR in data/*.parquet
84
+ python scripts/data/[deepcoder|deepscaler]_dataset.py
85
+ ```
86
+ ## Training
87
+ ```bash
88
+ export MODEL_PATH="agentica-org/DeepScaleR-1.5B-Preview"
89
+ ./scripts/e1-math/e1_math_1.5b_1k_1k.sh --model $MODEL_PATH
90
+ ```
91
+
92
+ ## Evaluation
93
+
94
+ To run our evaluation scripts, run:
95
+ ```bash
96
+ ./scripts/eval/eval_model.sh --model [CHECKPOINT_PATH] --datasets [DATASET1] [DATASET2] --output-dir [OUTPUT_DIR] --n [N_PASSES] --tp [TENSOR_PARALLEL_SIZE] --e1-mode [SEPARATE_BUDGETING] --e1-thinking-length [THINKING_LENGTH] --e1-solution-length [SOLUTION_LENGTH]
97
+ ```
98
+
99
+ ### Example on MATH
100
+ ```bash
101
+ ./scripts/eval/eval_model.sh --model Salesforce/E1-Math-1.5B --datasets aime math amc minerva olympiad_bench --output-dir $HOME/E1-Math-1.5B --tp 1 --n 16 --e1-mode True --e1-thinking-length 1024 --e1-solution-length 1024
102
+ ```
103
+ ### Example on LiveCodeBench
104
+ ```bash
105
+ ./scripts/eval/eval_model.sh --model Salesforce/E1-Code-14B --datasets test_livecodebench --output-dir $HOME/E1-Code-14B --tp 4 --e1-mode True --e1-thinking-length 1024 --e1-solution-length 1024
106
+ ```
107
+
108
+ ### Example on Codeforces
109
+ ```bash
110
+ ./scripts/eval/eval_model.sh --model Salesforce/E1-Code-14B --datasets test_codeforces --output-dir $HOME/DeepCoder-14B-Preview --tp 4 --n 8 --e1-mode True --e1-thinking-length 1024 --e1-solution-length 1024
111
+ ```
112
+ ```bash
113
+ python scripts/deepcoder/benchmark/cf_elo_calc.py --results_path [RESULTS_JSON_PATH] --pass_n 8
114
+ ```
115
+
116
+ ### Unconstrained evaluation
117
+ set `--e1-mode False` and `--max-length [Maxmum token length, e.g. 32768]`
118
+
119
+
120
+ ## Acknowledgement
121
+ We greatly thanks [rllm](https://github.com/agentica-project/rllm) and [verl](https://github.com/volcengine/verl) for providing the awesome codebase!
122
+
123
+ ## Citation
124
+
125
+
126
+ ```bibtex
127
+ @article{xu2025scalable,
128
+ title={Scalable Chain of Thoughts via Elastic Reasoning},
129
+ author={Xu, Yuhui and Dong, Hanze and Wang, Lei and Sahoo, Doyen and Li, Junnan and Xiong, Caiming},
130
+ journal={arXiv preprint arXiv:2505.05315},
131
+ year={2025}
132
+ }
133
+ ```
134
  """)
135
 
136