Add custom sampler, train data loader and GRPO style train loop for ReTool_trainer c710786 verified bird-of-paradise commited on 4 days ago
replace `model.generate` with custom generation function to optimize kv_cache a0dec77 verified bird-of-paradise commited on 25 days ago