This is an experimental RP finetune on top of Qwen3 4B Base. No reasoning data, just focusing on general instructions and RP.

Honestly, I'm not really used to finetuning Qwen models, so if you somehow stumble upon this and decide to try it, I'd really appreciate any feedback you might have, especially if you find issues.


Model Testing

This model has been trained with various structured outputs for both RP and general use. It should, therefore, be able to follow structured output formats based on the system prompt or the first message in an RP scenario. (well... most of the time, if it doesn't, please leave feedback)

Use ChatML template.

[EXAMPLE SYSTEM PROMPT]: [EXAMPLE OUTPUT (Q6_K GGUF, TEMP=0.85, MIN_P=0.05)]:


Datatsets:

  • cognitivecomputations/dolphin-r1 (non-reasoning) <- Used this in both stages
  • Gryphe/Sonnet3.5-Charcard-Roleplay (filtered) <- First stage
  • Stories from Reddit (heavily filtered) <- First stage
  • Some structured output data, kinda like IFeval <- First stage
  • LMSYS and Helpsteer (only the multiturn chats) <- Second stage
  • Gemma and Gemini RP data <- Second stage

(gray is first stage, and blue is the second stage)

Downloads last month
837
Safetensors
Model size
4.02B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hastagaras/Qibil-4B-v0.1-RP

Base model

Qwen/Qwen3-4B-Base
Finetuned
(84)
this model
Merges
3 models
Quantizations
9 models

Collection including Hastagaras/Qibil-4B-v0.1-RP