Spaces:

VIDraft
/

Wan2GP

Running

File size: 8,081 Bytes

78360e7

# Models Overview

WanGP supports multiple video generation models, each optimized for different use cases and hardware configurations. 


## Wan 2.1 Text2Video Models
Please note that that the term *Text2Video* refers to the underlying Wan architecture but as it has been greatly improved overtime many derived Text2Video models can now  generate videos using images.

#### Wan 2.1 Text2Video 1.3B
- **Size**: 1.3 billion parameters
- **VRAM**: 6GB minimum
- **Speed**: Fast generation
- **Quality**: Good quality for the size
- **Best for**: Quick iterations, lower-end hardware
- **Command**: `python wgp.py --t2v-1-3B`

#### Wan 2.1 Text2Video 14B
- **Size**: 14 billion parameters  
- **VRAM**: 12GB+ recommended
- **Speed**: Slower but higher quality
- **Quality**: Excellent detail and coherence
- **Best for**: Final production videos
- **Command**: `python wgp.py --t2v-14B`

#### Wan Vace 1.3B
- **Type**: ControlNet for advanced video control
- **VRAM**: 6GB minimum
- **Features**: Motion transfer, object injection, inpainting
- **Best for**: Advanced video manipulation
- **Command**: `python wgp.py --vace-1.3B`

#### Wan Vace 14B
- **Type**: Large ControlNet model
- **VRAM**: 12GB+ recommended
- **Features**: All Vace features with higher quality
- **Best for**: Professional video editing workflows

#### MoviiGen (Experimental)
- **Resolution**: Claims 1080p capability
- **VRAM**: 20GB+ required
- **Speed**: Very slow generation
- **Features**: Should generate cinema like video, specialized for 2.1 / 1 ratios
- **Status**: Experimental, feedback welcome

<BR>

## Wan 2.1 Image-to-Video Models

#### Wan 2.1 Image2Video 14B
- **Size**: 14 billion parameters  
- **VRAM**: 12GB+ recommended
- **Speed**: Slower but higher quality
- **Quality**: Excellent detail and coherence
- **Best for**: Most Loras available work with this model
- **Command**: `python wgp.py --i2v-14B`

#### FLF2V
- **Type**: Start/end frame specialist
- **Resolution**: Optimized for 720p
- **Official**: Wan team supported
- **Use case**: Image-to-video with specific endpoints


<BR>

## Wan 2.1 Specialized Models

#### FantasySpeaking
- **Type**: Talking head animation
- **Input**: Voice track + image
- **Works on**: People and objects
- **Use case**: Lip-sync and voice-driven animation

#### Phantom
- **Type**: Person/object transfer
- **Resolution**: Works well at 720p
- **Requirements**: 30+ steps for good results
- **Best for**: Transferring subjects between videos

#### Recam Master
- **Type**: Viewpoint change
- **Requirements**: 81+ frame input videos, 15+ denoising steps
- **Use case**: View same scene from different angles

#### Sky Reels v2
- **Type**: Diffusion Forcing model
- **Specialty**: "Infinite length" videos
- **Features**: High quality continuous generation


<BR>

## Wan Fun InP Models

#### Wan Fun InP 1.3B
- **Size**: 1.3 billion parameters
- **VRAM**: 6GB minimum
- **Quality**: Good for the size, accessible to lower hardware
- **Best for**: Entry-level image animation
- **Command**: `python wgp.py --i2v-1-3B`

#### Wan Fun InP 14B
- **Size**: 14 billion parameters
- **VRAM**: 12GB+ recommended
- **Quality**: Better end image support
- **Limitation**: Existing loras don't work as well

<BR>

## Wan Special Loras
### Safe-Forcing lightx2v Lora
- **Type**: Distilled model (Lora implementation)
- **Speed**: 4-8 steps generation, 2x faster (no classifier free guidance)
- **Compatible**: Works with t2v and i2v Wan 14B models
- **Setup**: Requires Safe-Forcing lightx2v Lora (see [LORAS.md](LORAS.md))


### Causvid Lora
- **Type**: Distilled model (Lora implementation)
- **Speed**: 4-12 steps generation, 2x faster (no classifier free guidance)
- **Compatible**: Works with Wan 14B models
- **Setup**: Requires CausVid Lora (see [LORAS.md](LORAS.md))


<BR>

## Hunyuan Video Models

#### Hunyuan Video Text2Video
- **Quality**: Among the best open source t2v models
- **VRAM**: 12GB+ recommended
- **Speed**: Slower generation but excellent results
- **Features**: Superior text adherence and video quality, up to 10s of video
- **Best for**: High-quality text-to-video generation

#### Hunyuan Video Custom
- **Specialty**: Identity preservation
- **Use case**: Injecting specific people into videos
- **Quality**: Excellent for character consistency
- **Best for**: Character-focused video generation

#### Hunyuan Video Avater
- **Specialty**: Generate up to 15s of high quality speech / song driven Video .
- **Use case**: Injecting specific people into videos
- **Quality**: Excellent for character consistency
- **Best for**: Character-focused video generation, Video synchronized with voice


<BR>

## LTX Video Models

#### LTX Video 13B
- **Specialty**: Long video generation
- **Resolution**: Fast 720p generation
- **VRAM**: Optimized by WanGP (4x reduction in requirements)
- **Best for**: Longer duration videos

#### LTX Video 13B Distilled
- **Speed**: Generate in less than one minute
- **Quality**: Very high quality despite speed
- **Best for**: Rapid prototyping and quick results

<BR>

## Model Selection Guide

### By Hardware (VRAM)

#### 6-8GB VRAM
- Wan 2.1 T2V 1.3B
- Wan Fun InP 1.3B
- Wan Vace 1.3B

#### 10-12GB VRAM
- Wan 2.1 T2V 14B
- Wan Fun InP 14B
- Hunyuan Video (with optimizations)
- LTX Video 13B

#### 16GB+ VRAM
- All models supported
- Longer videos possible
- Higher resolutions
- Multiple simultaneous Loras

#### 20GB+ VRAM
- MoviiGen (experimental 1080p)
- Very long videos
- Maximum quality settings

### By Use Case

#### Quick Prototyping
1. **LTX Video 13B Distilled** - Fastest, high quality
2. **Wan 2.1 T2V 1.3B** - Fast, good quality
3. **CausVid Lora** - 4-12 steps, very fast

#### Best Quality
1. **Hunyuan Video** - Overall best t2v quality
2. **Wan 2.1 T2V 14B** - Excellent Wan quality
3. **Wan Vace 14B** - Best for controlled generation

#### Advanced Control
1. **Wan Vace 14B/1.3B** - Motion transfer, object injection
2. **Phantom** - Person/object transfer
3. **FantasySpeaking** - Voice-driven animation

#### Long Videos
1. **LTX Video 13B** - Specialized for length
2. **Sky Reels v2** - Infinite length videos
3. **Wan Vace + Sliding Windows** - Up to 1 minute

#### Lower Hardware
1. **Wan Fun InP 1.3B** - Image-to-video
2. **Wan 2.1 T2V 1.3B** - Text-to-video
3. **Wan Vace 1.3B** - Advanced control

<BR>

## Performance Comparison

### Speed (Relative)
1. **CausVid Lora** (4-12 steps) - Fastest
2. **LTX Video Distilled** - Very fast
3. **Wan 1.3B models** - Fast
4. **Wan 14B models** - Medium
5. **Hunyuan Video** - Slower
6. **MoviiGen** - Slowest

### Quality (Subjective)
1. **Hunyuan Video** - Highest overall
2. **Wan 14B models** - Excellent
3. **LTX Video models** - Very good
4. **Wan 1.3B models** - Good
5. **CausVid** - Good (varies with steps)

### VRAM Efficiency
1. **Wan 1.3B models** - Most efficient
2. **LTX Video** (with WanGP optimizations)
3. **Wan 14B models**
4. **Hunyuan Video**
5. **MoviiGen** - Least efficient

<BR>

## Model Switching

WanGP allows switching between models without restarting:

1. Use the dropdown menu in the web interface
2. Models are loaded on-demand
3. Previous model is unloaded to save VRAM
4. Settings are preserved when possible

<BR>

## Tips for Model Selection

### First Time Users
Start with **Wan 2.1 T2V 1.3B** to learn the interface and test your hardware.

### Production Work
Use **Hunyuan Video** or **Wan 14B** models for final output quality.

### Experimentation
**CausVid Lora** or **LTX Distilled** for rapid iteration and testing.

### Specialized Tasks
- **VACE** for advanced control
- **FantasySpeaking** for talking heads
- **LTX Video** for long sequences

### Hardware Optimization
Always start with the largest model your VRAM can handle, then optimize settings for speed vs quality based on your needs.