thejaminator's picture
verl GRPO trained model at step 1
1c54e58 verified
metadata
language: en
license: apache-2.0
tags:
  - verl
  - grpo
  - math
  - reasoning
  - rl
  - lora
  - peft
base_model: google/gemma-2-9b-it
library_name: peft

thejaminator/grpo-feature-vector-step-1

This is a LoRA adapter trained using verl with GRPO (Group Relative Policy Optimization) on math reasoning tasks.

Training Details

  • Base model: google/gemma-2-9b-it
  • Framework: verl GRPO
  • Training steps: 1
  • Dataset: Math reasoning problems
  • Batch size: 8
  • Learning rate: 5e-05
  • LoRA rank: 64
  • LoRA alpha: 128.0
  • Number of generations: 16

Generated from verl LoRA checkpoint: /workspace/verl_outputs_feature_vector/global_step_1/actor/lora_adapter