PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation Paper • 2507.16116 • Published 23 days ago • 10
Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation Paper • 2507.05963 • Published Jul 8 • 12
Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection Paper • 2507.07994 • Published Jul 10 • 2
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 59
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks Paper • 2507.11336 • Published 30 days ago • 4
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published 28 days ago • 40
T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper • 2507.05964 • Published Jul 8 • 115
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper • 2506.08279 • Published Jun 9 • 28
Seedance 1.0: Exploring the Boundaries of Video Generation Models Paper • 2506.09113 • Published Jun 10 • 101
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 261
SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers Paper • 2506.00830 • Published Jun 1 • 7
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training Paper • 2506.05301 • Published Jun 5 • 55
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 68
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis Paper • 2506.06276 • Published Jun 6 • 22
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation Paper • 2503.18429 • Published Mar 24 • 3
OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication Paper • 2504.02433 • Published Apr 3 • 1
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers Paper • 2506.02528 • Published Jun 3 • 15