VertexRegen: Mesh Generation with Continuous Level of Detail Paper • 2508.09062 • Published 2 days ago • 23
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension Paper • 2508.01959 • Published 11 days ago • 55
ForCenNet: Foreground-Centric Network for Document Image Rectification Paper • 2507.19804 • Published 20 days ago • 11
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2 • 35
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Paper • 2506.04614 • Published Jun 5 • 16
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published May 5 • 28
LLMs for Engineering: Teaching Models to Design High Powered Rockets Paper • 2504.19394 • Published Apr 27 • 14
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published Apr 11 • 47
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published Apr 10 • 19
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published Mar 30 • 95
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Paper • 2503.05639 • Published Mar 7 • 24
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 81
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 146