Interplay-LM-Reasoning
/

extrapolation_midtrain

Text Generation

Model card Files Files and versions

xet

Community

Add pipeline tag, GitHub link, and improved model description

by nielsr HF Staff - opened 17 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+23

-1

Files changed (1) hide show

README.md +23 -1

README.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 license: mit
 ---
 <h1 align="center">
@@ -23,8 +24,29 @@ Carnegie Mellon University, Language Technologies Institute
 </div>
-This repository contains mid-training related checkpoints in the extrapolation tasks.
 ## 📚 Citation

 ---
 license: mit
+pipeline_tag: text-generation
 ---
 <h1 align="center">
 </div>
+## Does Reinforcement Learning Truly Extend Reasoning?
+This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks.
+## 🔍 Overview
+Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study:
+*   **Extrapolative generalization** to more complex compositions (deeper dependency graphs).
+*   **Contextual generalization** across diverse surface forms and linguistic contexts.
+*   How **RL interacts** with prior knowledge, and when it yields **genuine capability gains** beyond pre-training.
+## 🧠 Key findings
+<div align="center">
+  <h1 align="center">
+    <img src="assets/findings.png" width="500" />
+    </h1>
+</div>
+You may also find the comic generated by Notebook LLM [here](assets/Interplay-LM-Reasoning.pdf).
+## Code
+The code and data for this work will be released soon at the following GitHub repository: [https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning)
 ## 📚 Citation