thejaminator
/

grpo-feature-vector-step-1

Model card Files Files and versions Community

grpo-feature-vector-step-1 / README.md

thejaminator's picture

verl GRPO trained model at step 1

1c54e58 verified 10 days ago

|

history blame contribute delete

735 Bytes

	---
	language: en
	license: apache-2.0
	tags:
	- verl
	- grpo
	- math
	- reasoning
	- rl
	- lora
	- peft
	base_model: google/gemma-2-9b-it
	library_name: peft
	---

	# thejaminator/grpo-feature-vector-step-1

	This is a LoRA adapter trained using [verl](https://github.com/volcengine/verl) with GRPO (Group Relative Policy Optimization)
	on math reasoning tasks.

	## Training Details

	- Base model: google/gemma-2-9b-it
	- Framework: verl GRPO
	- Training steps: 1
	- Dataset: Math reasoning problems
	- Batch size: 8
	- Learning rate: 5e-05
	- LoRA rank: 64
	- LoRA alpha: 128.0
	- Number of generations: 16


	Generated from verl LoRA checkpoint: `/workspace/verl_outputs_feature_vector/global_step_1/actor/lora_adapter`