HanSaem Kim

kensaem

AI & ML interests

None yet

Recent Activity

upvoted a paper 17 days ago

PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation

upvoted a paper 27 days ago

StreamDiT: Real-Time Streaming Text-to-Video Generation

upvoted a paper 27 days ago

Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation

View all activity

Organizations

None yet

upvoted a paper 17 days ago

PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation

Paper • 2507.16116 • Published 23 days ago • 10

upvoted 7 papers 27 days ago

StreamDiT: Real-Time Streaming Text-to-Video Generation

Paper • 2507.03745 • Published Jul 4 • 28

Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation

Paper • 2507.05963 • Published Jul 8 • 12

4KAgent: Agentic Any Image to 4K Super-Resolution

Paper • 2507.07105 • Published Jul 9 • 97

Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection

Paper • 2507.07994 • Published Jul 10 • 2

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7 • 59

UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks

Paper • 2507.11336 • Published 30 days ago • 4

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Paper • 2507.12841 • Published 28 days ago • 40

upvoted 2 papers about 1 month ago

T-LoRA: Single Image Diffusion Model Customization Without Overfitting

Paper • 2507.05964 • Published Jul 8 • 115

SingLoRA: Low Rank Adaptation Using a Single Matrix

Paper • 2507.05566 • Published Jul 8 • 110

upvoted 3 papers about 2 months ago

upvoted 7 papers 2 months ago

SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers

Paper • 2506.00830 • Published Jun 1 • 7

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Paper • 2506.05301 • Published Jun 5 • 55

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5 • 68

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Paper • 2506.06276 • Published Jun 6 • 22

Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation

Paper • 2503.18429 • Published Mar 24 • 3

OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication

Paper • 2504.02433 • Published Apr 3 • 1

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

Paper • 2506.02528 • Published Jun 3 • 15

HanSaem Kim

AI & ML interests

Recent Activity

Organizations

kensaem's activity