Wan2GP / docs /VACE.md
JoranF's picture
Upload folder using huggingface_hub
ccfe94d verified

A newer version of the Gradio SDK is available: 5.42.0

Upgrade

VACE ControlNet Guide

VACE is a powerful ControlNet that enables Video-to-Video and Reference-to-Video generation. It allows you to inject your own images into output videos, animate characters, perform inpainting/outpainting, and continue videos.

Overview

VACE is probably one of the most powerful Wan models available. With it, you can:

  • Inject people or objects into scenes
  • Animate characters
  • Perform video inpainting and outpainting
  • Continue existing videos
  • Transfer motion from one video to another
  • Change the style of scenes while preserving depth

Getting Started

Model Selection

  1. Select either "Vace 1.3B" or "Vace 13B" from the dropdown menu
  2. Note: VACE works best with videos up to 7 seconds with the Riflex option enabled

Input Types

VACE accepts three types of visual hints (which can be combined):

1. Control Video

  • Transfer motion or depth to a new video
  • Use only the first n frames and extrapolate the rest
  • Perform inpainting with grey color (127) as mask areas
  • Grey areas will be filled based on text prompt and reference images

2. Reference Images

  • Use as background/setting for the video
  • Inject people or objects of your choice
  • Select multiple reference images
  • Tip: Replace complex backgrounds with white for better object integration
  • Always describe injected objects/people explicitly in your text prompt

3. Video Mask

  • Stronger control over which parts to keep (black) or replace (white)
  • Perfect for inpainting/outpainting
  • Example: White mask except at beginning/end (black) keeps first/last frames while generating middle content

Common Use Cases

Motion Transfer

Goal: Animate a character of your choice using motion from another video Setup:

  • Reference Images: Your character
  • Control Video: Person performing desired motion
  • Text Prompt: Describe your character and the action

Object/Person Injection

Goal: Insert people or objects into a scene Setup:

  • Reference Images: The people/objects to inject
  • Text Prompt: Describe the scene and explicitly mention the injected elements

Character Animation

Goal: Animate a character based on text description Setup:

  • Control Video: Video of person moving
  • Text Prompt: Detailed description of your character

Style Transfer with Depth

Goal: Change scene style while preserving spatial relationships Setup:

  • Control Video: Original video (for depth information)
  • Text Prompt: New style description

Integrated Matanyone Tool

WanGP includes the Matanyone tool, specifically tuned for VACE workflows. This helps create control videos and masks simultaneously.

Creating Face Replacement Masks

  1. Load your video in Matanyone
  2. Click on the face in the first frame
  3. Create a mask for the face
  4. Generate both control video and mask video with "Generate Video Matting"
  5. Export to VACE with "Export to current Video Input and Video Mask"
  6. Load replacement face image in Reference Images field

Advanced Matanyone Tips

  • Negative Point Prompts: Remove parts from current selection
  • Sub Masks: Create multiple independent masks, then combine them
  • Background Masks: Select everything except the character (useful for background replacement)
  • Enable/disable sub masks in Matanyone settings

Recommended Settings

Quality Settings

  • Skip Layer Guidance: Turn ON with default configuration for better results
  • Long Prompts: Use detailed descriptions, especially for background elements not in reference images
  • Steps: Use at least 15 steps for good quality, 30+ for best results

Sliding Window Settings

For very long videos, configure sliding windows properly:

  • Window Size: Set appropriate duration for your content
  • Overlap Frames: Long enough for motion continuity, short enough to avoid blur propagation
  • Discard Last Frames: Remove at least 4 frames from each window (VACE 1.3B tends to blur final frames)

Background Removal

VACE includes automatic background removal options:

  • Use for reference images containing people/objects
  • Don't use for landscape/setting reference images (first reference image)
  • Multiple background removal types available

Window Sliding for Long Videos

Generate videos up to 1 minute by merging multiple windows:

How It Works

  • Each window uses corresponding time segment from control video
  • Example: 0-4s control video → first window, 4-8s → second window, etc.
  • Automatic overlap management ensures smooth transitions

Settings

  • Window Size: Duration of each generation window
  • Overlap Frames: Frames shared between windows for continuity
  • Discard Last Frames: Remove poor-quality ending frames
  • Add Overlapped Noise: Reduce quality degradation over time

Formula

Generated Frames = [Windows - 1] × [Window Size - Overlap - Discard] + Window Size

Multi-Line Prompts (Experimental)

  • Each line of prompt used for different window
  • If more windows than prompt lines, last line repeats
  • Separate lines with carriage return

Advanced Features

Extend Video

Click "Extend the Video Sample, Please!" during generation to add more windows dynamically.

Noise Addition

Add noise to overlapped frames to hide accumulated errors and quality degradation.

Frame Truncation

Automatically remove lower-quality final frames from each window (recommended: 4 frames for VACE 1.3B).

External Resources

Official VACE Resources

Recommended External Tools

  • Annotation Tools: For creating precise masks
  • Video Editors: For preparing control videos
  • Background Removal: For cleaning reference images

Troubleshooting

Poor Quality Results

  1. Use longer, more detailed prompts
  2. Enable Skip Layer Guidance
  3. Increase number of steps (30+)
  4. Check reference image quality
  5. Ensure proper mask creation

Inconsistent Windows

  1. Increase overlap frames
  2. Use consistent prompting across windows
  3. Add noise to overlapped frames
  4. Reduce discard frames if losing too much content

Memory Issues

  1. Use VACE 1.3B instead of 13B
  2. Reduce video length or resolution
  3. Decrease window size
  4. Enable quantization

Blurry Results

  1. Reduce overlap frames
  2. Increase discard last frames
  3. Use higher resolution reference images
  4. Check control video quality

Tips for Best Results

  1. Detailed Prompts: Describe everything in the scene, especially elements not in reference images
  2. Quality Reference Images: Use high-resolution, well-lit reference images
  3. Proper Masking: Take time to create precise masks with Matanyone
  4. Iterative Approach: Start with short videos, then extend successful results
  5. Background Preparation: Remove complex backgrounds from object/person reference images
  6. Consistent Lighting: Match lighting between reference images and intended scene