YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GRPO-Trained Qwen 2.5 LoRA

This is a GRPO (Generative Reward-Penalty Optimization) fine-tuned version of Qwen 2.5 using LoRA.

Model Details

  • Base model: Qwen/Qwen2.5
  • Training method: GRPO
  • LoRA rank: 16
  • Training objectives: XML formatting, correctness, and structured output
  • Use case: Generating well-formatted, structured outputs with improved accuracy

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5")
# Load the LoRA adapter
model = PeftModel.from_pretrained(model, "yashwanthjanke/qwen2.5-grpo-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5")

# Use the model
input_text = "Your prompt here"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support