EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture Paper • 2512.04810 • Published 23 days ago • 25
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 71
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 20