metadata
language: en
license: apache-2.0
tags:
- verl
- grpo
- math
- reasoning
- rl
- lora
- peft
base_model: google/gemma-2-9b-it
library_name: peft
thejaminator/grpo-feature-vector-step-1
This is a LoRA adapter trained using verl with GRPO (Group Relative Policy Optimization) on math reasoning tasks.
Training Details
- Base model: google/gemma-2-9b-it
- Framework: verl GRPO
- Training steps: 1
- Dataset: Math reasoning problems
- Batch size: 8
- Learning rate: 5e-05
- LoRA rank: 64
- LoRA alpha: 128.0
- Number of generations: 16
Generated from verl LoRA checkpoint: /workspace/verl_outputs_feature_vector/global_step_1/actor/lora_adapter