File size: 8,081 Bytes
78360e7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
# Models Overview
WanGP supports multiple video generation models, each optimized for different use cases and hardware configurations.
## Wan 2.1 Text2Video Models
Please note that that the term *Text2Video* refers to the underlying Wan architecture but as it has been greatly improved overtime many derived Text2Video models can now generate videos using images.
#### Wan 2.1 Text2Video 1.3B
- **Size**: 1.3 billion parameters
- **VRAM**: 6GB minimum
- **Speed**: Fast generation
- **Quality**: Good quality for the size
- **Best for**: Quick iterations, lower-end hardware
- **Command**: `python wgp.py --t2v-1-3B`
#### Wan 2.1 Text2Video 14B
- **Size**: 14 billion parameters
- **VRAM**: 12GB+ recommended
- **Speed**: Slower but higher quality
- **Quality**: Excellent detail and coherence
- **Best for**: Final production videos
- **Command**: `python wgp.py --t2v-14B`
#### Wan Vace 1.3B
- **Type**: ControlNet for advanced video control
- **VRAM**: 6GB minimum
- **Features**: Motion transfer, object injection, inpainting
- **Best for**: Advanced video manipulation
- **Command**: `python wgp.py --vace-1.3B`
#### Wan Vace 14B
- **Type**: Large ControlNet model
- **VRAM**: 12GB+ recommended
- **Features**: All Vace features with higher quality
- **Best for**: Professional video editing workflows
#### MoviiGen (Experimental)
- **Resolution**: Claims 1080p capability
- **VRAM**: 20GB+ required
- **Speed**: Very slow generation
- **Features**: Should generate cinema like video, specialized for 2.1 / 1 ratios
- **Status**: Experimental, feedback welcome
<BR>
## Wan 2.1 Image-to-Video Models
#### Wan 2.1 Image2Video 14B
- **Size**: 14 billion parameters
- **VRAM**: 12GB+ recommended
- **Speed**: Slower but higher quality
- **Quality**: Excellent detail and coherence
- **Best for**: Most Loras available work with this model
- **Command**: `python wgp.py --i2v-14B`
#### FLF2V
- **Type**: Start/end frame specialist
- **Resolution**: Optimized for 720p
- **Official**: Wan team supported
- **Use case**: Image-to-video with specific endpoints
<BR>
## Wan 2.1 Specialized Models
#### FantasySpeaking
- **Type**: Talking head animation
- **Input**: Voice track + image
- **Works on**: People and objects
- **Use case**: Lip-sync and voice-driven animation
#### Phantom
- **Type**: Person/object transfer
- **Resolution**: Works well at 720p
- **Requirements**: 30+ steps for good results
- **Best for**: Transferring subjects between videos
#### Recam Master
- **Type**: Viewpoint change
- **Requirements**: 81+ frame input videos, 15+ denoising steps
- **Use case**: View same scene from different angles
#### Sky Reels v2
- **Type**: Diffusion Forcing model
- **Specialty**: "Infinite length" videos
- **Features**: High quality continuous generation
<BR>
## Wan Fun InP Models
#### Wan Fun InP 1.3B
- **Size**: 1.3 billion parameters
- **VRAM**: 6GB minimum
- **Quality**: Good for the size, accessible to lower hardware
- **Best for**: Entry-level image animation
- **Command**: `python wgp.py --i2v-1-3B`
#### Wan Fun InP 14B
- **Size**: 14 billion parameters
- **VRAM**: 12GB+ recommended
- **Quality**: Better end image support
- **Limitation**: Existing loras don't work as well
<BR>
## Wan Special Loras
### Safe-Forcing lightx2v Lora
- **Type**: Distilled model (Lora implementation)
- **Speed**: 4-8 steps generation, 2x faster (no classifier free guidance)
- **Compatible**: Works with t2v and i2v Wan 14B models
- **Setup**: Requires Safe-Forcing lightx2v Lora (see [LORAS.md](LORAS.md))
### Causvid Lora
- **Type**: Distilled model (Lora implementation)
- **Speed**: 4-12 steps generation, 2x faster (no classifier free guidance)
- **Compatible**: Works with Wan 14B models
- **Setup**: Requires CausVid Lora (see [LORAS.md](LORAS.md))
<BR>
## Hunyuan Video Models
#### Hunyuan Video Text2Video
- **Quality**: Among the best open source t2v models
- **VRAM**: 12GB+ recommended
- **Speed**: Slower generation but excellent results
- **Features**: Superior text adherence and video quality, up to 10s of video
- **Best for**: High-quality text-to-video generation
#### Hunyuan Video Custom
- **Specialty**: Identity preservation
- **Use case**: Injecting specific people into videos
- **Quality**: Excellent for character consistency
- **Best for**: Character-focused video generation
#### Hunyuan Video Avater
- **Specialty**: Generate up to 15s of high quality speech / song driven Video .
- **Use case**: Injecting specific people into videos
- **Quality**: Excellent for character consistency
- **Best for**: Character-focused video generation, Video synchronized with voice
<BR>
## LTX Video Models
#### LTX Video 13B
- **Specialty**: Long video generation
- **Resolution**: Fast 720p generation
- **VRAM**: Optimized by WanGP (4x reduction in requirements)
- **Best for**: Longer duration videos
#### LTX Video 13B Distilled
- **Speed**: Generate in less than one minute
- **Quality**: Very high quality despite speed
- **Best for**: Rapid prototyping and quick results
<BR>
## Model Selection Guide
### By Hardware (VRAM)
#### 6-8GB VRAM
- Wan 2.1 T2V 1.3B
- Wan Fun InP 1.3B
- Wan Vace 1.3B
#### 10-12GB VRAM
- Wan 2.1 T2V 14B
- Wan Fun InP 14B
- Hunyuan Video (with optimizations)
- LTX Video 13B
#### 16GB+ VRAM
- All models supported
- Longer videos possible
- Higher resolutions
- Multiple simultaneous Loras
#### 20GB+ VRAM
- MoviiGen (experimental 1080p)
- Very long videos
- Maximum quality settings
### By Use Case
#### Quick Prototyping
1. **LTX Video 13B Distilled** - Fastest, high quality
2. **Wan 2.1 T2V 1.3B** - Fast, good quality
3. **CausVid Lora** - 4-12 steps, very fast
#### Best Quality
1. **Hunyuan Video** - Overall best t2v quality
2. **Wan 2.1 T2V 14B** - Excellent Wan quality
3. **Wan Vace 14B** - Best for controlled generation
#### Advanced Control
1. **Wan Vace 14B/1.3B** - Motion transfer, object injection
2. **Phantom** - Person/object transfer
3. **FantasySpeaking** - Voice-driven animation
#### Long Videos
1. **LTX Video 13B** - Specialized for length
2. **Sky Reels v2** - Infinite length videos
3. **Wan Vace + Sliding Windows** - Up to 1 minute
#### Lower Hardware
1. **Wan Fun InP 1.3B** - Image-to-video
2. **Wan 2.1 T2V 1.3B** - Text-to-video
3. **Wan Vace 1.3B** - Advanced control
<BR>
## Performance Comparison
### Speed (Relative)
1. **CausVid Lora** (4-12 steps) - Fastest
2. **LTX Video Distilled** - Very fast
3. **Wan 1.3B models** - Fast
4. **Wan 14B models** - Medium
5. **Hunyuan Video** - Slower
6. **MoviiGen** - Slowest
### Quality (Subjective)
1. **Hunyuan Video** - Highest overall
2. **Wan 14B models** - Excellent
3. **LTX Video models** - Very good
4. **Wan 1.3B models** - Good
5. **CausVid** - Good (varies with steps)
### VRAM Efficiency
1. **Wan 1.3B models** - Most efficient
2. **LTX Video** (with WanGP optimizations)
3. **Wan 14B models**
4. **Hunyuan Video**
5. **MoviiGen** - Least efficient
<BR>
## Model Switching
WanGP allows switching between models without restarting:
1. Use the dropdown menu in the web interface
2. Models are loaded on-demand
3. Previous model is unloaded to save VRAM
4. Settings are preserved when possible
<BR>
## Tips for Model Selection
### First Time Users
Start with **Wan 2.1 T2V 1.3B** to learn the interface and test your hardware.
### Production Work
Use **Hunyuan Video** or **Wan 14B** models for final output quality.
### Experimentation
**CausVid Lora** or **LTX Distilled** for rapid iteration and testing.
### Specialized Tasks
- **VACE** for advanced control
- **FantasySpeaking** for talking heads
- **LTX Video** for long sequences
### Hardware Optimization
Always start with the largest model your VRAM can handle, then optimize settings for speed vs quality based on your needs. |