VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory Paper • 2512.04519 • Published Dec 4, 2025 • 5
Self-Evaluation Unlocks Any-Step Text-to-Image Generation Paper • 2512.22374 • Published Dec 26, 2025 • 17
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published Dec 23, 2025 • 50
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory Paper • 2511.22609 • Published Nov 27, 2025 • 49
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14, 2025 • 37
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published Jul 11, 2025 • 62