VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Paper • 2506.10857 • Published 16 days ago • 31
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 274
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling Paper • 2501.00574 • Published Dec 31, 2024 • 6
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published Dec 26, 2024 • 18