Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Paper • 2507.07990 • Published Jul 10 • 45
One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL Paper • 2506.02338 • Published Jun 3 • 4
Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach Paper • 2503.21819 • Published Mar 26 • 1
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues Paper • 2506.00958 • Published Jun 1 • 20
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published May 24 • 37
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness Paper • 2505.05026 • Published May 8 • 17
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness Paper • 2505.05026 • Published May 8 • 17
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms Paper • 2503.14427 • Published Mar 18 • 19
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding Paper • 2411.19527 • Published Nov 29, 2024 • 10
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper • 2410.13232 • Published Oct 17, 2024 • 45