papers
updated
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper
•
2412.09013
•
Published
•
13
Deep Researcher with Test-Time Diffusion
Paper
•
2507.16075
•
Published
•
67
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper
•
2507.13546
•
Published
•
124
Yume: An Interactive World Generation Model
Paper
•
2507.17744
•
Published
•
88
Latent Denoising Makes Good Visual Tokenizers
Paper
•
2507.15856
•
Published
•
10
Dynamic Reflections: Probing Video Representations with Text Alignment
Paper
•
2511.02767
•
Published
•
3
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
Paper
•
2511.10555
•
Published
•
60
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
Paper
•
2511.11793
•
Published
•
164
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper
•
2511.12609
•
Published
•
103
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Paper
•
2511.09611
•
Published
•
68
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
Paper
•
2511.13704
•
Published
•
42
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Paper
•
2511.12710
•
Published
•
38
Back to Basics: Let Denoising Generative Models Denoise
Paper
•
2511.13720
•
Published
•
67
Draft and Refine with Visual Experts
Paper
•
2511.11005
•
Published
•
2
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper
•
2511.14993
•
Published
•
226
Medal S: Spatio-Textual Prompt Model for Medical Segmentation
Paper
•
2511.13001
•
Published
•
1
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation
Paper
•
2511.16671
•
Published
•
15
NaTex: Seamless Texture Generation as Latent Color Diffusion
Paper
•
2511.16317
•
Published
•
15
Scaling Spatial Intelligence with Multimodal Foundation Models
Paper
•
2511.13719
•
Published
•
46
Plan-X: Instruct Video Generation via Semantic Planning
Paper
•
2511.17986
•
Published
•
16
SAM 3D: 3Dfy Anything in Images
Paper
•
2511.16624
•
Published
•
110
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper
•
2511.19365
•
Published
•
63
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios
Paper
•
2511.18050
•
Published
•
37
In-Video Instructions: Visual Signals as Generative Control
Paper
•
2511.19401
•
Published
•
30
Paper
•
2511.11238
•
Published
•
37
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper
•
2511.10629
•
Published
•
123
SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control
Paper
•
2511.09715
•
Published
•
8
Generating an Image From 1,000 Words: Enhancing Text-to-Image With
Structured Captions
Paper
•
2511.06876
•
Published
•
27
EVTAR: End-to-End Try on with Additional Unpaired Visual Reference
Paper
•
2511.00956
•
Published
•
4
UniLumos: Fast and Unified Image and Video Relighting with
Physics-Plausible Feedback
Paper
•
2511.01678
•
Published
•
35
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
•
2504.13837
•
Published
•
139