zeeshaan-ai's picture
Update README.md
4c1d3ac verified
---
datasets:
- nvidia/OpenCodeReasoning-2
base_model:
- openai/gpt-oss-20b
library_name: transformers
tags:
- code-reasoning
- vllm
pipeline_tag: text-generation
---
<img src="gpt-oss-reasoning.png" width="700"/>
### Overview
- Base model: `openai/gpt-oss-20b`
- Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
- Dataset: `nvidia/OpenCodeReasoning-2` (OCR-2), combining `python` and `cpp` splits. Each sample reconstructs the upstream question and uses the dataset's `r1_generation` as the assistant response
- Context length: 4096 tokens
- Training method: LoRA SFT via TRL `SFTTrainer`
### Intended Use
- Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
- Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code
### Prompt Format
This model was trained in a chat format. Recommended structure:
```python
messages = [
{"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
{"role": "user", "content": problem_text},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
```
If you prefer plain text, place the problem text after a brief instruction, but chat format generally yields better results.
### Reasoning Effort
Specify reasoning effort in `apply_chat_template` (supported values: "low", "medium" (default), or "high"):
```python
messages = [
{"role": "system", "content": "Always respond in riddles"},
{"role": "user", "content": "Explain why the meaning of life is 42"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
reasoning_effort="high",
).to(model.device)
generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
```
### Quick Start (Transformers)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "GetSoloTech/gpt-oss-code-reasoning-20b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=auto,
device_map="auto",
)
problem_text = """
You are given an array of integers ... (your problem here)
"""
messages = [
{"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
{"role": "user", "content": problem_text},
]
input_text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
reasoning_effort="medium",
)
inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=768,
temperature=0.3,
top_p=0.9,
repetition_penalty=1.1,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Generation Tips
- Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
- Length: Use `max_new_tokens` 512–1024 for full solutions; shorter for hints
- Stop tokens: If you only want final code, consider post-processing the model output to extract the last code block
### Dataset Construction Notes
- Source: `nvidia/OpenCodeReasoning-2` with `python` and `cpp` splits
- For each split, the script:
- Shuffles and selects up to `--take_samples` examples per split
- Reconstructs the problem statement from upstream benchmarks (TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`)
- Filters out rows with missing/empty questions or assistant responses
- Builds chat-style `messages` and a formatted `text` field with the tokenizer's chat template
- The final training set is the concatenation of both splits, followed by an optional `train_test_split` according to `--eval_ratio`
### Acknowledgements
- Unsloth (`FastLanguageModel`) for efficient 4-bit loading and fast PEFT
- TRL (`SFTTrainer`) for straightforward supervised fine-tuning
- NVIDIA OpenCodeReasoning-2 and upstream benchmarks (TACO, APPS, CodeContests, `open-r1/codeforces`)
---