π^3: Scalable Permutation-Equivariant Visual Geometry Learning Paper • 2507.13347 • Published 21 days ago • 63
A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation Paper • 2506.09427 • Published Jun 11 • 9
A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation Paper • 2506.09427 • Published Jun 11 • 9 • 2
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT Paper • 2406.18583 • Published Jun 5, 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Paper • 2407.11062 • Published Jul 10, 2024 • 10
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping Paper • 2410.08695 • Published Oct 11, 2024
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality Paper • 2412.04062 • Published Dec 5, 2024 • 9
LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation Paper • 2501.12976 • Published Jan 22
Neighboring Autoregressive Modeling for Efficient Visual Generation Paper • 2503.10696 • Published Mar 12 • 8
CLS-RL: Image Classification with Rule-Based Reinforcement Learning Paper • 2503.16188 • Published Mar 20 • 11
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models Paper • 2504.05782 • Published Apr 8 • 4