File size: 2,474 Bytes

88c085f
b2b9f2f
88c085f
b2b9f2f
88c085f
 
 
 
 
ae96ffb
88c085f
 
b2b9f2f
88c085f
b2b9f2f
88c085f
dc7a2d9
 
 
88c085f
 
 
b2b9f2f
88c085f
b2b9f2f
 
88c085f
b2b9f2f
 
 
 
 
 
88c085f
b2b9f2f
88c085f
b2b9f2f
 
 
 
 
88c085f
b2b9f2f
88c085f
dc7a2d9
88c085f
b2b9f2f
88c085f
dc7a2d9

---
datasets: open-r1/openr1-220k-math
library_name: transformers
model_name: OpenR1-Qwen-7B
tags:
- generated_from_trainer
- trl
- sft
licence: license
license: apache-2.0
---

# OpenR1-Qwen-7B

This is a finetune of [Qwen2.5-Math-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) on [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) (`default` split).

> [!NOTE]
> Check out [OpenR1-Distill-7B](https://huggingface.co/open-r1/OpenR1-Distill-7B) for an improved model that was trained on [open-r1/Mixture-of-Thoughts](https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts) and replicates the performance of [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) across multiple reasoning domains.

## Quick start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "open-r1/OpenR1-Qwen-7B"
device = "cuda" 

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."

messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]
```

## Training 

We train the model on the `default` split of [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) for 3 epochs. We use learning rate of 5e-5 and extend the context length from 4k to 32k, by increasing RoPE frequency to 300k. The training follows a linear learning rate schedule with a 10% warmup phase. The table below compares the performance of OpenR1-Qwen-7B to [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) and [OpenThinker-7B](https://huggingface.co/open-thoughts/OpenThinker-7B) using [lighteval](https://github.com/huggingface/open-r1/tree/main?tab=readme-ov-file#evaluating-models).

You can find the training and evaluation code at: https://github.com/huggingface/open-r1/

| Model                    | MATH-500 | AIME 2024 | AIME 2025 | GPQA-D |
|--------------------------|----------|-----------|-----------|--------|
| DeepSeek-Distill-Qwen-7B | 93.5     | 51.3      | 35.8      | 52.4   |
| OpenR1-Qwen-7B           | 90.6     | 47.0      | 33.2      | 42.4   |
| OpenThinker-7B           | 86.4     | 31.3      | 24.6      | 39.1   |