Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published 9 days ago • 33
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 11 days ago • 93
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 10 days ago • 54
Improving Recursive Transformers with Mixture of LoRAs Paper • 2512.12880 • Published 26 days ago • 5