XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation Paper • 2506.21416 • Published 19 days ago • 28
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Paper • 2501.07888 • Published Jan 14 • 16