YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
GRPO-Trained Qwen 2.5 LoRA
This is a GRPO (Generative Reward-Penalty Optimization) fine-tuned version of Qwen 2.5 using LoRA.
Model Details
- Base model: Qwen/Qwen2.5
- Training method: GRPO
- LoRA rank: 16
- Training objectives: XML formatting, correctness, and structured output
- Use case: Generating well-formatted, structured outputs with improved accuracy
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5")
# Load the LoRA adapter
model = PeftModel.from_pretrained(model, "yashwanthjanke/qwen2.5-grpo-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5")
# Use the model
input_text = "Your prompt here"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support