Hastagaras/Qibil-4B-v0.1-RP

This is an experimental RP finetune on top of Qwen3 4B Base. No reasoning data, just focusing on general instructions and RP.

Honestly, I'm not really used to finetuning Qwen models, so if you somehow stumble upon this and decide to try it, I'd really appreciate any feedback you might have, especially if you find issues.

Model Testing

This model has been trained with various structured outputs for both RP and general use. It should, therefore, be able to follow structured output formats based on the system prompt or the first message in an RP scenario. (well... most of the time, if it doesn't, please leave feedback)

Use ChatML template.

[EXAMPLE SYSTEM PROMPT]: [EXAMPLE OUTPUT (Q6_K GGUF, TEMP=0.85, MIN_P=0.05)]:

Datatsets:

cognitivecomputations/dolphin-r1 (non-reasoning) <- Used this in both stages
Gryphe/Sonnet3.5-Charcard-Roleplay (filtered) <- First stage
Stories from Reddit (heavily filtered) <- First stage
Some structured output data, kinda like IFeval <- First stage
LMSYS and Helpsteer (only the multiturn chats) <- Second stage
Gemma and Gemini RP data <- Second stage

(gray is first stage, and blue is the second stage)

Hastagaras
/

Qibil-4B-v0.1-RP

Model tree for Hastagaras/Qibil-4B-v0.1-RP

Collection including Hastagaras/Qibil-4B-v0.1-RP

lahh