edoarc commited on
Commit
5e61ec4
·
verified ·
1 Parent(s): 7ba0673

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model:
7
+ - Qwen/Qwen2.5-7B-Instruct
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - reasoning
11
+ - reinforcement
12
+ - learning
13
+ - RLT
14
+ - math
15
+ - science
16
+ - code
17
+ ---
18
+
19
+ # Reinforcement-Learned Teacher Student 7B
20
+
21
+ This repository contains a 7B parameter student model trained using the **Reinforcement-Learned Teachers (RLT)** pipeline introduced in our paper [Reinforcement Learning Teachers](https://arxiv.org/abs/2506.08388).
22
+
23
+ ## Model Description
24
+
25
+ The 7B RLT student is distilled from a 7B Reinforcement-Learned Teacher, which has been explicitly trained to produce high-quality reasoning traces optimized for student distillation.
26
+
27
+ ## Usage
28
+
29
+ The model was trained with supervised fine-tuning using the same hyperparameters, the system prompt, and the reasoning tags from [Li et al. 2025](https://arxiv.org/pdf/2502.07374).
30
+ Evaluation was conducted using the [SkyThought](https://github.com/NovaSky-AI/SkyThought) library at commit `4bb8f3e`. Please refer to our [repository](https://github.com/SakanaAI/RLT) and [paper](https://arxiv.org/abs/2506.08388) for details and results.