
This is an experimental RP finetune on top of Qwen3 4B Base. No reasoning data, just focusing on general instructions and RP.
Honestly, I'm not really used to finetuning Qwen models, so if you somehow stumble upon this and decide to try it, I'd really appreciate any feedback you might have, especially if you find issues.
Model Testing
This model has been trained with various structured outputs for both RP and general use. It should, therefore, be able to follow structured output formats based on the system prompt or the first message in an RP scenario. (well... most of the time, if it doesn't, please leave feedback)
Use ChatML template.
[EXAMPLE SYSTEM PROMPT]:
[EXAMPLE OUTPUT (Q6_K GGUF, TEMP=0.85, MIN_P=0.05)]:
Datatsets:
cognitivecomputations/dolphin-r1
(non-reasoning) <- Used this in both stagesGryphe/Sonnet3.5-Charcard-Roleplay
(filtered) <- First stage- Stories from Reddit (heavily filtered) <- First stage
- Some structured output data, kinda like IFeval <- First stage
LMSYS
andHelpsteer
(only the multiturn chats) <- Second stage- Gemma and Gemini RP data <- Second stage

(gray is first stage, and blue is the second stage)
- Downloads last month
- 837