Test-Time Reinforcement Learning for GUI Grounding via Region Consistency Paper • 2508.05615 • Published 7 days ago • 18
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models Paper • 2508.09138 • Published 2 days ago • 30
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation Paper • 2507.22886 • Published 15 days ago • 9
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Paper • 2507.15758 • Published 24 days ago • 34
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation Paper • 2507.18537 • Published 21 days ago • 17
π^3: Scalable Permutation-Equivariant Visual Geometry Learning Paper • 2507.13347 • Published 28 days ago • 64
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • Jul 9 • 643
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Paper • 2506.09984 • Published Jun 11 • 15
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting Paper • 2506.05327 • Published Jun 5 • 11
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation Paper • 2506.03139 • Published Jun 3 • 16
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Paper • 2505.21457 • Published May 27 • 14
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration Paper • 2505.20256 • Published May 26 • 17