VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory Paper • 2506.18903 • Published Jun 23 • 22
PixelHacker: Image Inpainting with Structural and Semantic Consistency Paper • 2504.20438 • Published Apr 29 • 44
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published Apr 24 • 114
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance Paper • 2504.01724 • Published Apr 2 • 69
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement Paper • 2503.04919 • Published Mar 6 • 8
💫StarVector Models Collection StarVector is a multimodal LLM for Scalable Vector Graphics (SVG) generation, producing structured SVG code directly from images and text. • 2 items • Updated Mar 20 • 96
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published Jan 27 • 17
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 62
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published Jan 14 • 20
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 99
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 85
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion Paper • 2412.04301 • Published Dec 5, 2024 • 42
One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published Dec 2, 2024 • 22
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3, 2024 • 55
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22, 2024 • 133
Distilling an End-to-End Voice Assistant Without Instruction Training Data Paper • 2410.02678 • Published Oct 3, 2024 • 23
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8, 2024 • 83