villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper • 2507.23682 • Published 7 days ago • 22
Running on Zero MCP 282 282 ThinkSound 🔊 Generate audio for a video using captions and descriptions
Running on Zero MCP 282 282 ThinkSound 🔊 Generate audio for a video using captions and descriptions
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing Paper • 2506.21448 • Published Jun 26 • 8
Running on Zero MCP 282 282 ThinkSound 🔊 Generate audio for a video using captions and descriptions
Running on Zero MCP 282 282 ThinkSound 🔊 Generate audio for a video using captions and descriptions