ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection Paper • 2601.09195 • Published 10 days ago • 15
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published Dec 9, 2025 • 132
TianHongZXY/Qwen3-4B-Thinking-2507-SFT-10-epochs-synthesized-clear-problems-global_step_280 0.5B • Updated Nov 5, 2025
TianHongZXY/Qwen3-4B-Thinking-2507-SFT-10-epochs-synthesized-clear-problems-global_step_280 0.5B • Updated Nov 5, 2025
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30, 2025 • 55
RAST: Reasoning Activation in LLMs via Small-model Transfer Paper • 2506.15710 • Published May 30, 2025
TianHongZXY/similar_problems_with_three_in_context_problems Viewer • Updated Sep 4, 2025 • 2.16k • 4.07k
TianHongZXY/similar_problems_with_three_in_context_problems Viewer • Updated Sep 4, 2025 • 2.16k • 4.07k
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code Paper • 2508.18106 • Published Aug 25, 2025 • 348
TianHongZXY/Top_5_similar_question-NVIDIA-OpenScienceReasoning-2 Viewer • Updated Aug 28, 2025 • 2.16k • 1.19k
TianHongZXY/Top_5_similar_question-NVIDIA-OpenScienceReasoning-2 Viewer • Updated Aug 28, 2025 • 2.16k • 1.19k
RLVR-Decomposed Collection The collection for the Paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning" • 9 items • Updated Jun 1, 2025 • 3
TianHongZXY/OpenR1-Math-46k-8192-Qwen2.5-Math-7B-RoPE-40K-GRPO-use_guide-clip_ratio_upper_0.28 Updated Jul 12, 2025
TianHongZXY/OpenR1-Math-46k-8192-Qwen2.5-Math-7B-RoPE-40K-GRPO-use_guide-clip_ratio_upper_0.28 Updated Jul 12, 2025