new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Mar 19

Submitted by

ZhangRC

RWKV-7 "Goose" with Expressive Dynamic State Evolution

·
15 authors

Submitted by

akhaliq

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

·
35 authors

5

Submitted by

ZechenBai

Impossible Videos

·
3 authors

Submitted by

nebulae09

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

·
12 authors

Submitted by

carboncoo

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

·
8 authors

Submitted by

ZhaoyangLyu

Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation

·
12 authors

Submitted by

akhaliq

AudioX: Diffusion Transformer for Anything-to-Audio Generation

·
8 authors

Submitted by

yifanzhang114

Aligning Multimodal LLM with Human Preference: A Survey

·
17 authors

Submitted by

cckevinn

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

·
10 authors

Submitted by

mathfinder

Frac-Connections: Fractional Extension of Hyper-Connections

·
8 authors

4

Submitted by

akhaliq

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

·
39 authors

Submitted by

zhangysk

FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis

·
9 authors

Submitted by

akhaliq

Measuring AI Ability to Complete Long Tasks

·
25 authors

Submitted by

kumarkrishna

Atlas: Multi-Scale Attention Improves Long Context Image Modeling

·
9 authors

Submitted by

Lingaaaaaaa

Temporal Consistency for LLM Reasoning Process Error Identification

·
7 authors

Submitted by

kpzhang996

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification

·
9 authors

2

Submitted by

BestWishYsh

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

·
5 authors

Submitted by

jacklishufan

Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

·
7 authors

2

Submitted by

edaxberger

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

·
11 authors

4

Submitted by

PengDa02

Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs

·
9 authors

Submitted by

Spravil

Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models

·
3 authors

2

Submitted by

HoangHa

Pensez: Less Data, Better Reasoning -- Rethinking French LLM

·
1 authors

2

Submitted by

kpzhang996

PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models

·
11 authors

2

Submitted by

ZhiyuanZeng

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

·
4 authors

Submitted by

zhuoyanxu

Learning to Inference Adaptively for Multimodal Large Language Models

·
7 authors

Submitted by

zhenzhangcs

PyGDA: A Python Library for Graph Domain Adaptation

·
3 authors

2

Submitted by

DamianBoborzi

MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling

·
6 authors

Submitted by

tobi1modna

Hyperbolic Safety-Aware Vision-Language Models

·
5 authors

2

Submitted by

Mingtongz

KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation

·
3 authors

Submitted by

yuwendu

RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation

·
9 authors

2

Submitted by

cxliu0314

CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving

·
5 authors

2