Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 3 days ago • 83
DEER: Draft with Diffusion, Verify with Autoregressive Models Paper • 2512.15176 • Published 15 days ago • 41
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing Paper • 2512.14681 • Published 16 days ago • 39
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published 17 days ago • 87
From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs Paper • 2512.06776 • Published 25 days ago • 24
Learning Unmasking Policies for Diffusion Language Models Paper • 2512.09106 • Published 23 days ago • 9
UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers Paper • 2512.04504 • Published 28 days ago • 16
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers Paper • 2511.20123 • Published Nov 25, 2025 • 17
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Paper • 2510.22115 • Published Oct 25, 2025 • 83
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published Oct 29, 2025 • 77
Parallel Loop Transformer for Efficient Test-Time Computation Scaling Paper • 2510.24824 • Published Oct 28, 2025 • 16
Uniform Discrete Diffusion with Metric Path for Video Generation Paper • 2510.24717 • Published Oct 28, 2025 • 40
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 39
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Paper • 2510.06308 • Published Oct 7, 2025 • 54
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70