Add pipeline tag, GitHub link, and improved model description

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +23 -1
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
  license: mit
 
3
  ---
4
 
5
  <h1 align="center">
@@ -23,8 +24,29 @@ Carnegie Mellon University, Language Technologies Institute
23
 
24
  </div>
25
 
 
26
 
27
- This repository contains mid-training related checkpoints in the extrapolation tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## πŸ“š Citation
30
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: text-generation
4
  ---
5
 
6
  <h1 align="center">
 
24
 
25
  </div>
26
 
27
+ ## Does Reinforcement Learning Truly Extend Reasoning?
28
 
29
+ This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks.
30
+
31
+ ## πŸ” Overview
32
+
33
+ Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study:
34
+
35
+ * **Extrapolative generalization** to more complex compositions (deeper dependency graphs).
36
+ * **Contextual generalization** across diverse surface forms and linguistic contexts.
37
+ * How **RL interacts** with prior knowledge, and when it yields **genuine capability gains** beyond pre-training.
38
+
39
+ ## 🧠 Key findings
40
+ <div align="center">
41
+ <h1 align="center">
42
+ <img src="assets/findings.png" width="500" />
43
+ </h1>
44
+ </div>
45
+ You may also find the comic generated by Notebook LLM [here](assets/Interplay-LM-Reasoning.pdf).
46
+
47
+ ## Code
48
+
49
+ The code and data for this work will be released soon at the following GitHub repository: [https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning)
50
 
51
  ## πŸ“š Citation
52