Spaces:

JoranF
/

Wan2GP

Runtime error

App Files Files Community

Wan2GP / docs /VACE.md

JoranF

Upload folder using huggingface_hub

ccfe94d verified 3 months ago

preview code

raw

history blame

7.32 kB

	# VACE ControlNet Guide

	VACE is a powerful ControlNet that enables Video-to-Video and Reference-to-Video generation. It allows you to inject your own images into output videos, animate characters, perform inpainting/outpainting, and continue videos.

	## Overview

	VACE is probably one of the most powerful Wan models available. With it, you can:
	- Inject people or objects into scenes
	- Animate characters
	- Perform video inpainting and outpainting
	- Continue existing videos
	- Transfer motion from one video to another
	- Change the style of scenes while preserving depth

	## Getting Started

	### Model Selection
	1. Select either "Vace 1.3B" or "Vace 13B" from the dropdown menu
	2. Note: VACE works best with videos up to 7 seconds with the Riflex option enabled

	### Input Types

	VACE accepts three types of visual hints (which can be combined):

	#### 1. Control Video
	- Transfer motion or depth to a new video
	- Use only the first n frames and extrapolate the rest
	- Perform inpainting with grey color (127) as mask areas
	- Grey areas will be filled based on text prompt and reference images

	#### 2. Reference Images
	- Use as background/setting for the video
	- Inject people or objects of your choice
	- Select multiple reference images
	- Tip: Replace complex backgrounds with white for better object integration
	- Always describe injected objects/people explicitly in your text prompt

	#### 3. Video Mask
	- Stronger control over which parts to keep (black) or replace (white)
	- Perfect for inpainting/outpainting
	- Example: White mask except at beginning/end (black) keeps first/last frames while generating middle content

	## Common Use Cases

	### Motion Transfer
	Goal: Animate a character of your choice using motion from another video
	Setup:
	- Reference Images: Your character
	- Control Video: Person performing desired motion
	- Text Prompt: Describe your character and the action

	### Object/Person Injection
	Goal: Insert people or objects into a scene
	Setup:
	- Reference Images: The people/objects to inject
	- Text Prompt: Describe the scene and explicitly mention the injected elements

	### Character Animation
	Goal: Animate a character based on text description
	Setup:
	- Control Video: Video of person moving
	- Text Prompt: Detailed description of your character

	### Style Transfer with Depth
	Goal: Change scene style while preserving spatial relationships
	Setup:
	- Control Video: Original video (for depth information)
	- Text Prompt: New style description

	## Integrated Matanyone Tool

	WanGP includes the Matanyone tool, specifically tuned for VACE workflows. This helps create control videos and masks simultaneously.

	### Creating Face Replacement Masks
	1. Load your video in Matanyone
	2. Click on the face in the first frame
	3. Create a mask for the face
	4. Generate both control video and mask video with "Generate Video Matting"
	5. Export to VACE with "Export to current Video Input and Video Mask"
	6. Load replacement face image in Reference Images field

	### Advanced Matanyone Tips
	- Negative Point Prompts: Remove parts from current selection
	- Sub Masks: Create multiple independent masks, then combine them
	- Background Masks: Select everything except the character (useful for background replacement)
	- Enable/disable sub masks in Matanyone settings

	## Recommended Settings

	### Quality Settings
	- Skip Layer Guidance: Turn ON with default configuration for better results
	- Long Prompts: Use detailed descriptions, especially for background elements not in reference images
	- Steps: Use at least 15 steps for good quality, 30+ for best results

	### Sliding Window Settings
	For very long videos, configure sliding windows properly:

	- Window Size: Set appropriate duration for your content
	- Overlap Frames: Long enough for motion continuity, short enough to avoid blur propagation
	- Discard Last Frames: Remove at least 4 frames from each window (VACE 1.3B tends to blur final frames)

	### Background Removal
	VACE includes automatic background removal options:
	- Use for reference images containing people/objects
	- Don't use for landscape/setting reference images (first reference image)
	- Multiple background removal types available

	## Window Sliding for Long Videos

	Generate videos up to 1 minute by merging multiple windows:

	### How It Works
	- Each window uses corresponding time segment from control video
	- Example: 0-4s control video → first window, 4-8s → second window, etc.
	- Automatic overlap management ensures smooth transitions

	### Settings
	- Window Size: Duration of each generation window
	- Overlap Frames: Frames shared between windows for continuity
	- Discard Last Frames: Remove poor-quality ending frames
	- Add Overlapped Noise: Reduce quality degradation over time

	### Formula
	```
	Generated Frames = [Windows - 1] × [Window Size - Overlap - Discard] + Window Size
	```

	### Multi-Line Prompts (Experimental)
	- Each line of prompt used for different window
	- If more windows than prompt lines, last line repeats
	- Separate lines with carriage return

	## Advanced Features

	### Extend Video
	Click "Extend the Video Sample, Please!" during generation to add more windows dynamically.

	### Noise Addition
	Add noise to overlapped frames to hide accumulated errors and quality degradation.

	### Frame Truncation
	Automatically remove lower-quality final frames from each window (recommended: 4 frames for VACE 1.3B).

	## External Resources

	### Official VACE Resources
	- GitHub: https://github.com/ali-vilab/VACE/tree/main/vace/gradios
	- User Guide: https://github.com/ali-vilab/VACE/blob/main/UserGuide.md
	- Preprocessors: Gradio tools for preparing materials

	### Recommended External Tools
	- Annotation Tools: For creating precise masks
	- Video Editors: For preparing control videos
	- Background Removal: For cleaning reference images

	## Troubleshooting

	### Poor Quality Results
	1. Use longer, more detailed prompts
	2. Enable Skip Layer Guidance
	3. Increase number of steps (30+)
	4. Check reference image quality
	5. Ensure proper mask creation

	### Inconsistent Windows
	1. Increase overlap frames
	2. Use consistent prompting across windows
	3. Add noise to overlapped frames
	4. Reduce discard frames if losing too much content

	### Memory Issues
	1. Use VACE 1.3B instead of 13B
	2. Reduce video length or resolution
	3. Decrease window size
	4. Enable quantization

	### Blurry Results
	1. Reduce overlap frames
	2. Increase discard last frames
	3. Use higher resolution reference images
	4. Check control video quality

	## Tips for Best Results

	1. Detailed Prompts: Describe everything in the scene, especially elements not in reference images
	2. Quality Reference Images: Use high-resolution, well-lit reference images
	3. Proper Masking: Take time to create precise masks with Matanyone
	4. Iterative Approach: Start with short videos, then extend successful results
	5. Background Preparation: Remove complex backgrounds from object/person reference images
	6. Consistent Lighting: Match lighting between reference images and intended scene