Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning Paper • 2605.14386 • Published 8 days ago • 58
MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published 9 days ago • 216
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation Paper • 2605.13724 • Published 9 days ago • 96
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 15 days ago • 186
Flow-OPD: On-Policy Distillation for Flow Matching Models Paper • 2605.08063 • Published 14 days ago • 97
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 25 days ago • 118
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published Apr 6 • 114
Gemma 4 Uncensored Collection Abliterated Gemma 4 models with refusal behavior removed. Biprojection + EGA for MoE. Cross-validated against 686 prompts from 4 datasets. • 8 items • Updated Apr 5 • 85
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 351
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published Mar 26 • 53
From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space Paper • 2603.12648 • Published Mar 13 • 14
Qwen3.5 Unredacted MAX Collection Continual “abliteration” models – experimental. • 8 items • Updated 24 days ago • 4
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published Feb 13 • 45
SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published Feb 13 • 59