Submitted by Nicolas-BZRD 76 Should We Still Pretrain Encoders with Masked Language Modeling? · 8 authors 66 9
Submitted by RunpeiDong 43 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge · 13 authors 147 2
Submitted by JunhaoZhuang 41 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture · 7 authors 2
Submitted by RowitZou 39 Pre-Trained Policy Discriminators are General Reward Models · 22 authors 146 1
Submitted by hiyouga 39 Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents · 7 authors 10.1k 1
Submitted by KYLN24 23 BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset · 15 authors 14 1
Submitted by Bibaolong 18 RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs · 10 authors 1
Submitted by ziyjiang 16 VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents · 13 authors 363 3
Submitted by justinyyy 15 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding · 7 authors 1
Submitted by ZZXF 13 Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration · 8 authors 41 1
Submitted by ai-hyz 11 Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions · 3 authors 49 2
Submitted by SteveZeyuZhang 10 PresentAgent: Multimodal Agent for Presentation Video Generation · 7 authors 93 1
Submitted by xxzcc 9 ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation · 32 authors 1
Submitted by Johnyquest7 9 Preserving Privacy, Increasing Accessibility, and Reducing Cost: An On-Device Artificial Intelligence Model for Medical Transcription and Note Generation · 6 authors 1
Submitted by cedricbonhomme 6 VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification · 2 authors 13 1
Submitted by danielchyeh 5 Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing · 7 authors 1
Submitted by ashutosh1919 5 Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky · 3 authors 1
Submitted by amanchadha 3 MOD-X: A Modular Open Decentralized eXchange Framework proposal for Heterogeneous Interoperable Artificial Agents · 5 authors 1
Submitted by jannalu 2 Evaluating LLMs on Real-World Forecasting Against Human Superforecasters · 1 authors 2