V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper โข 2504.06148 โข Published Apr 8 โข 13
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper โข 2503.20198 โข Published Mar 26 โข 4
UniVTG: Towards Unified Video-Language Temporal Grounding Paper โข 2307.16715 โข Published Jul 31, 2023 โข 11