Update README.md

4c1d3ac verified 3 days ago

4.27 kB

	---
	datasets:
	- nvidia/OpenCodeReasoning-2
	base_model:
	- openai/gpt-oss-20b
	library_name: transformers
	tags:
	- code-reasoning
	- vllm
	pipeline_tag: text-generation
	---

	<img src="gpt-oss-reasoning.png" width="700"/>

	### Overview

	- Base model: `openai/gpt-oss-20b`
	- Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
	- Dataset: `nvidia/OpenCodeReasoning-2` (OCR-2), combining `python` and `cpp` splits. Each sample reconstructs the upstream question and uses the dataset's `r1_generation` as the assistant response
	- Context length: 4096 tokens
	- Training method: LoRA SFT via TRL `SFTTrainer`

	### Intended Use

	- Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
	- Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code

	### Prompt Format

	This model was trained in a chat format. Recommended structure:

	```python
	messages = [
	{"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
	{"role": "user", "content": problem_text},
	]

	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	)
	```

	If you prefer plain text, place the problem text after a brief instruction, but chat format generally yields better results.

	### Reasoning Effort

	Specify reasoning effort in `apply_chat_template` (supported values: "low", "medium" (default), or "high"):

	```python
	messages = [
	{"role": "system", "content": "Always respond in riddles"},
	{"role": "user", "content": "Explain why the meaning of life is 42"},
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt",
	return_dict=True,
	reasoning_effort="high",
	).to(model.device)

	generated = model.generate(**inputs, max_new_tokens=500)
	print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
	```

	### Quick Start (Transformers)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "GetSoloTech/gpt-oss-code-reasoning-20b"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=auto,
	device_map="auto",
	)

	problem_text = """
	You are given an array of integers ... (your problem here)
	"""

	messages = [
	{"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
	{"role": "user", "content": problem_text},
	]

	input_text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	reasoning_effort="medium",
	)

	inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=768,
	temperature=0.3,
	top_p=0.9,
	repetition_penalty=1.1,
	)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Generation Tips

	- Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
	- Length: Use `max_new_tokens` 512–1024 for full solutions; shorter for hints
	- Stop tokens: If you only want final code, consider post-processing the model output to extract the last code block


	### Dataset Construction Notes

	- Source: `nvidia/OpenCodeReasoning-2` with `python` and `cpp` splits
	- For each split, the script:
	- Shuffles and selects up to `--take_samples` examples per split
	- Reconstructs the problem statement from upstream benchmarks (TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`)
	- Filters out rows with missing/empty questions or assistant responses
	- Builds chat-style `messages` and a formatted `text` field with the tokenizer's chat template
	- The final training set is the concatenation of both splits, followed by an optional `train_test_split` according to `--eval_ratio`


	### Acknowledgements

	- Unsloth (`FastLanguageModel`) for efficient 4-bit loading and fast PEFT
	- TRL (`SFTTrainer`) for straightforward supervised fine-tuning
	- NVIDIA OpenCodeReasoning-2 and upstream benchmarks (TACO, APPS, CodeContests, `open-r1/codeforces`)

	---