File size: 2,474 Bytes
88c085f b2b9f2f 88c085f b2b9f2f 88c085f ae96ffb 88c085f b2b9f2f 88c085f b2b9f2f 88c085f dc7a2d9 88c085f b2b9f2f 88c085f b2b9f2f 88c085f b2b9f2f 88c085f b2b9f2f 88c085f b2b9f2f 88c085f b2b9f2f 88c085f dc7a2d9 88c085f b2b9f2f 88c085f dc7a2d9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
datasets: open-r1/openr1-220k-math
library_name: transformers
model_name: OpenR1-Qwen-7B
tags:
- generated_from_trainer
- trl
- sft
licence: license
license: apache-2.0
---
# OpenR1-Qwen-7B
This is a finetune of [Qwen2.5-Math-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) on [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) (`default` split).
> [!NOTE]
> Check out [OpenR1-Distill-7B](https://huggingface.co/open-r1/OpenR1-Distill-7B) for an improved model that was trained on [open-r1/Mixture-of-Thoughts](https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts) and replicates the performance of [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) across multiple reasoning domains.
## Quick start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "open-r1/OpenR1-Qwen-7B"
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."
messages = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": prompt}
]
```
## Training
We train the model on the `default` split of [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) for 3 epochs. We use learning rate of 5e-5 and extend the context length from 4k to 32k, by increasing RoPE frequency to 300k. The training follows a linear learning rate schedule with a 10% warmup phase. The table below compares the performance of OpenR1-Qwen-7B to [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) and [OpenThinker-7B](https://huggingface.co/open-thoughts/OpenThinker-7B) using [lighteval](https://github.com/huggingface/open-r1/tree/main?tab=readme-ov-file#evaluating-models).
You can find the training and evaluation code at: https://github.com/huggingface/open-r1/
| Model | MATH-500 | AIME 2024 | AIME 2025 | GPQA-D |
|--------------------------|----------|-----------|-----------|--------|
| DeepSeek-Distill-Qwen-7B | 93.5 | 51.3 | 35.8 | 52.4 |
| OpenR1-Qwen-7B | 90.6 | 47.0 | 33.2 | 42.4 |
| OpenThinker-7B | 86.4 | 31.3 | 24.6 | 39.1 | |