villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper • 2507.23682 • Published 11 days ago • 23
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing Paper • 2506.21448 • Published Jun 26 • 8