diff --git a/.gitattributes b/.gitattributes index a6344aac8c09253b3b630fb776ae94478aa0275b..6a268f8d40000acb896c80f21dd37d300a018bc9 100644 --- a/.gitattributes +++ b/.gitattributes @@ -33,3 +33,10 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text *.zip filter=lfs diff=lfs merge=lfs -text *.zst filter=lfs diff=lfs merge=lfs -text *tfevents* filter=lfs diff=lfs merge=lfs -text +assets/comp_effic.png filter=lfs diff=lfs merge=lfs -text +assets/data_for_diff_stage.jpg filter=lfs diff=lfs merge=lfs -text +assets/i2v_res.png filter=lfs diff=lfs merge=lfs -text +assets/t2v_res.jpg filter=lfs diff=lfs merge=lfs -text +assets/vben_vs_sota.png filter=lfs diff=lfs merge=lfs -text +assets/video_dit_arch.jpg filter=lfs diff=lfs merge=lfs -text +assets/video_vae_res.jpg filter=lfs diff=lfs merge=lfs -text diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..1a7a5c0a3450741744d7e8e4bfd960bd44f51d43 --- /dev/null +++ b/.gitignore @@ -0,0 +1,42 @@ +.* +*.py[cod] +# *.jpg +*.jpeg +# *.png +*.gif +*.bmp +*.mp4 +*.mov +*.mkv +*.log +*.zip +*.pt +*.pth +*.ckpt +*.safetensors +*.json +# *.txt +*.backup +*.pkl +*.html +*.pdf +*.whl +*.exe +cache +__pycache__/ +storage/ +samples/ +!.gitignore +!requirements.txt +.DS_Store +*DS_Store +google/ +Wan2.1-T2V-14B/ +Wan2.1-T2V-1.3B/ +Wan2.1-I2V-14B-480P/ +Wan2.1-I2V-14B-720P/ +outputs/ +gradio_outputs/ +ckpts/ +loras/ +loras_i2v/ diff --git a/LICENSE.txt b/LICENSE.txt new file mode 100644 index 0000000000000000000000000000000000000000..4bc6bc6f273d1e13d1aba10a9e41aeaf2f7628a1 --- /dev/null +++ b/LICENSE.txt @@ -0,0 +1,17 @@ +FREE for Non Commercial USE + +You are free to: +- Share — copy and redistribute the material in any medium or format +- Adapt — remix, transform, and build upon the material +The licensor cannot revoke these freedoms as long as you follow the license terms. + +Under the following terms: +- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. +NonCommercial — You may not use the material for commercial purposes . + +- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. +Notices: + +- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation . + +No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material. \ No newline at end of file diff --git a/README.md b/README.md index f063ba8cae06c0baa2a0d41d189ac5a1972ba7b2..5cd249fad23c7e758f61ff4b8563501a6e869471 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,127 @@ ---- -title: Wan2GP -emoji: ⚡ -colorFrom: green -colorTo: pink -sdk: gradio -sdk_version: 5.34.0 -app_file: app.py -pinned: false ---- - -Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference +--- +title: Wan2GP +app_file: wgp.py +sdk: gradio +sdk_version: 5.23.0 +--- +# WanGP + +----- +
+WanGP by DeepBeepMeep : The best Open Source Video Generative Models Accessible to the GPU Poor +
+ +WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models with: +- Low VRAM requirements (as low as 6 GB of VRAM is sufficient for certain models) +- Support for old GPUs (RTX 10XX, 20xx, ...) +- Very Fast on the latest GPUs +- Easy to use Full Web based interface +- Auto download of the required model adapted to your specific architecture +- Tools integrated to facilitate Video Generation : Mask Editor, Prompt Enhancer, Temporal and Spatial Generation +- Loras Support to customize each model +- Queuing system : make your shopping list of videos to generate and come back later + +**Discord Server to get Help from Other Users and show your Best Videos:** https://discord.gg/g7efUW9jGV + +**Follow DeepBeepMeep on Twitter/X to get the Latest News**: https://x.com/deepbeepmeep + +## 🔥 Latest Updates +### June 11 2025: WanGP v5.5 +👋 *Hunyuan Video Custom Audio*: it is similar to Hunyuan Video Avatar except there isn't any lower limit on the number of frames and you can use your reference images in a different context than the image itself\ +*Hunyuan Video Custom Edit*: Hunyuan Video Controlnet, use it to do inpainting and replace a person in a video while still keeping his poses. Similar to Vace but less restricted than the Wan models in terms of content... + + +### June 6 2025: WanGP v5.41 +👋 Bonus release: Support for **AccVideo** Lora to speed up x2 Video generations in Wan models. Check the Loras documentation to get the usage instructions of AccVideo.\ +You will need to do a *pip install -r requirements.txt* + +### June 6 2025: WanGP v5.4 +👋 World Exclusive : **Hunyuan Video Avatar** Support ! You won't need 80 GB of VRAM nor 32 GB oF VRAM, just 10 GB of VRAM will be sufficient to generate up to 15s of high quality speech / song driven Video at a high speed with no quality degradation. Support for TeaCache included.\ +Here is a link to the original repo where you will find some very interesting documentation and examples. https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar. Kudos to the Hunyuan Video Avatar team for the best model of its kind.\ +Also many thanks to Reevoy24 for his repackaging / completing the documentation + +### May 28 2025: WanGP v5.31 +👋 Added **Phantom 14B**, a model that you can use to transfer objects / people in the video. My preference goes to Vace that remains the king of controlnets. +VACE improvements: Better sliding window transitions, image mask support in Matanyone, new Extend Video feature, and enhanced background removal options. + +### May 26, 2025: WanGP v5.3 +👋 Settings management revolution! Now you can: +- Select any generated video and click *Use Selected Video Settings* to instantly reuse its configuration +- Drag & drop videos to automatically extract their settings metadata +- Export/import settings as JSON files for easy sharing and backup + +### May 20, 2025: WanGP v5.2 +👋 **CausVid support** - Generate videos in just 4-12 steps with the new distilled Wan model! Also added experimental MoviiGen for 1080p generation (20GB+ VRAM required). Check the Loras documentation to get the usage instructions of CausVid. + +### May 18, 2025: WanGP v5.1 +👋 **LTX Video 13B Distilled** - Generate high-quality videos in less than one minute! + +### May 17, 2025: WanGP v5.0 +👋 **One App to Rule Them All!** Added Hunyuan Video and LTX Video support, plus Vace 14B and integrated prompt enhancer. + +See full changelog: **[Changelog](docs/CHANGELOG.md)** + +## 📋 Table of Contents + +- [🚀 Quick Start](#-quick-start) +- [📦 Installation](#-installation) +- [🎯 Usage](#-usage) +- [📚 Documentation](#-documentation) +- [🔗 Related Projects](#-related-projects) + +## 🚀 Quick Start + +**One-click installation:** Get started instantly with [Pinokio App](https://pinokio.computer/) + +**Manual installation:** +```bash +git clone https://github.com/deepbeepmeep/Wan2GP.git +cd Wan2GP +conda create -n wan2gp python=3.10.9 +conda activate wan2gp +pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124 +pip install -r requirements.txt +``` + +**Run the application:** +```bash +python wgp.py # Text-to-video (default) +python wgp.py --i2v # Image-to-video +``` + +## 📦 Installation + +For detailed installation instructions for different GPU generations: +- **[Installation Guide](docs/INSTALLATION.md)** - Complete setup instructions for RTX 10XX to RTX 50XX + +## 🎯 Usage + +### Basic Usage +- **[Getting Started Guide](docs/GETTING_STARTED.md)** - First steps and basic usage +- **[Models Overview](docs/MODELS.md)** - Available models and their capabilities + +### Advanced Features +- **[Loras Guide](docs/LORAS.md)** - Using and managing Loras for customization +- **[VACE ControlNet](docs/VACE.md)** - Advanced video control and manipulation +- **[Command Line Reference](docs/CLI.md)** - All available command line options + +## 📚 Documentation + +- **[Changelog](docs/CHANGELOG.md)** - Latest updates and version history +- **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and solutions + +## 🔗 Related Projects + +### Other Models for the GPU Poor +- **[HuanyuanVideoGP](https://github.com/deepbeepmeep/HunyuanVideoGP)** - One of the best open source Text to Video generators +- **[Hunyuan3D-2GP](https://github.com/deepbeepmeep/Hunyuan3D-2GP)** - Image to 3D and text to 3D tool +- **[FluxFillGP](https://github.com/deepbeepmeep/FluxFillGP)** - Inpainting/outpainting tools based on Flux +- **[Cosmos1GP](https://github.com/deepbeepmeep/Cosmos1GP)** - Text to world generator and image/video to world +- **[OminiControlGP](https://github.com/deepbeepmeep/OminiControlGP)** - Flux-derived application for object transfer +- **[YuE GP](https://github.com/deepbeepmeep/YuEGP)** - Song generator with instruments and singer's voice + +--- + ++Made with ❤️ by DeepBeepMeep +
diff --git a/assets/comp_effic.png b/assets/comp_effic.png new file mode 100644 index 0000000000000000000000000000000000000000..741f12abd4bc11efd6177e7c59765d87eaf7e395 --- /dev/null +++ b/assets/comp_effic.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0e225caffb4b31295ad150f95ee852e4c3dde4a00ac8f79a2ff500f2ce26b8d +size 1793594 diff --git a/assets/data_for_diff_stage.jpg b/assets/data_for_diff_stage.jpg new file mode 100644 index 0000000000000000000000000000000000000000..a7ba97f116a3e3304d9960069344019787181368 --- /dev/null +++ b/assets/data_for_diff_stage.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59aec08409f2d46b0e640e4e120dc7cca52c08c3de56d026602dbcff1ebf241a +size 528268 diff --git a/assets/i2v_res.png b/assets/i2v_res.png new file mode 100644 index 0000000000000000000000000000000000000000..98470f121ae318c11d25fd3728cd5c93e0c6993d --- /dev/null +++ b/assets/i2v_res.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6823b3206d8d0cb18d3b5b949dec1217f1178109ba11f14e977b67e1f7b8a248 +size 891681 diff --git a/assets/logo.png b/assets/logo.png new file mode 100644 index 0000000000000000000000000000000000000000..0c55854cbd9692975f217714ffd83fd4b37f5dca Binary files /dev/null and b/assets/logo.png differ diff --git a/assets/t2v_res.jpg b/assets/t2v_res.jpg new file mode 100644 index 0000000000000000000000000000000000000000..7549a1f66d7aa8fb90b6e6181188efc1be0edc28 --- /dev/null +++ b/assets/t2v_res.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91db579092446be2a834bc67721a8e4346936f38c4edb912f459ca3e10f8f439 +size 301030 diff --git a/assets/vben_vs_sota.png b/assets/vben_vs_sota.png new file mode 100644 index 0000000000000000000000000000000000000000..cded47bc519dc2aeae2f370228209e8c9e74bc0b --- /dev/null +++ b/assets/vben_vs_sota.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a0e86ca85046d2675f97984b88b6e74df07bba8a62a31ab8a1aef50d4eda44e +size 1552119 diff --git a/assets/video_dit_arch.jpg b/assets/video_dit_arch.jpg new file mode 100644 index 0000000000000000000000000000000000000000..97d9c19d286b432c33d644d5b00061c2e2a3545a --- /dev/null +++ b/assets/video_dit_arch.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:195dceec6570289d8b01cc51d2e28a7786216f19de55b23978a52610d1646a66 +size 643369 diff --git a/assets/video_vae_res.jpg b/assets/video_vae_res.jpg new file mode 100644 index 0000000000000000000000000000000000000000..91ca92abf061f569b335f3b8ca63e796ce2f6103 --- /dev/null +++ b/assets/video_vae_res.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8f9e7f7353848056a615c8ef35ab86ec22976bb46cb27405008b4089701945c +size 212586 diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md new file mode 100644 index 0000000000000000000000000000000000000000..20ddfcaa346abf2059e1c71df4492e5b62dce663 --- /dev/null +++ b/docs/CHANGELOG.md @@ -0,0 +1,157 @@ +# Changelog + +## 🔥 Latest News +### June 11 2025: WanGP v5.5 +👋 *Hunyuan Video Custom Audio*: it is similar to Hunyuan Video Avatar excpet there isn't any lower limit on the number of frames and you can use your reference images in a different context than the image itself\ +*Hunyuan Video Custom Edit*: Hunyuan Video Controlnet, use it to do inpainting and replace a person in a video while still keeping his poses. Similar to Vace but less restricted than the Wan models in terms of content... + +### June 6 2025: WanGP v5.41 +👋 Bonus release: Support for **AccVideo** Lora to speed up x2 Video generations in Wan models. Check the Loras documentation to get the usage instructions of AccVideo. + +### June 6 2025: WanGP v5.4 +👋 World Exclusive : Hunyuan Video Avatar Support ! You won't need 80 GB of VRAM nor 32 GB oF VRAM, just 10 GB of VRAM will be sufficient to generate up to 15s of high quality speech / song driven Video at a high speed with no quality degradation. Support for TeaCache included. + +### May 26, 2025: WanGP v5.3 +👋 Happy with a Video generation and want to do more generations using the same settings but you can't remember what you did or you find it too hard to copy/paste one per one each setting from the file metadata? Rejoice! There are now multiple ways to turn this tedious process into a one click task: +- Select one Video recently generated in the Video Gallery and click *Use Selected Video Settings* +- Click *Drop File Here* and select a Video you saved somewhere, if the settings metadata have been saved with the Video you will be able to extract them automatically +- Click *Export Settings to File* to save on your harddrive the current settings. You will be able to use them later again by clicking *Drop File Here* and select this time a Settings json file + +### May 23, 2025: WanGP v5.21 +👋 Improvements for Vace: better transitions between Sliding Windows, Support for Image masks in Matanyone, new Extend Video for Vace, different types of automated background removal + +### May 20, 2025: WanGP v5.2 +👋 Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps. The great thing is that Kijai (Kudos to him!) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B. See [LORAS.md](LORAS.md) for instructions on how to use CausVid. + +Also as an experiment I have added support for the MoviiGen, the first model that claims to be capable of generating 1080p videos (if you have enough VRAM (20GB...) and be ready to wait for a long time...). Don't hesitate to share your impressions on the Discord server. + +### May 18, 2025: WanGP v5.1 +👋 Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos! + +### May 17, 2025: WanGP v5.0 +👋 One App to Rule Them All! Added support for the other great open source architectures: +- **Hunyuan Video**: text 2 video (one of the best, if not the best t2v), image 2 video and the recently released Hunyuan Custom (very good identity preservation when injecting a person into a video) +- **LTX Video 13B** (released last week): very long video support and fast 720p generation. Wan GP version has been greatly optimized and reduced LTX Video VRAM requirements by 4! + +Also: +- Added support for the best Control Video Model, released 2 days ago: Vace 14B +- New Integrated prompt enhancer to increase the quality of the generated videos + +*You will need one more `pip install -r requirements.txt`* + +### May 5, 2025: WanGP v4.5 +👋 FantasySpeaking model, you can animate a talking head using a voice track. This works not only on people but also on objects. Also better seamless transitions between Vace sliding windows for very long videos. New high quality processing features (mixed 16/32 bits calculation and 32 bits VAE) + +### April 27, 2025: WanGP v4.4 +👋 Phantom model support, very good model to transfer people or objects into video, works quite well at 720p and with the number of steps > 30 + +### April 25, 2025: WanGP v4.3 +👋 Added preview mode and support for Sky Reels v2 Diffusion Forcing for high quality "infinite length videos". Note that Skyreel uses causal attention that is only supported by Sdpa attention so even if you choose another type of attention, some of the processes will use Sdpa attention. + +### April 18, 2025: WanGP v4.2 +👋 FLF2V model support, official support from Wan for image2video start and end frames specialized for 720p. + +### April 17, 2025: WanGP v4.1 +👋 Recam Master model support, view a video from a different angle. The video to process must be at least 81 frames long and you should set at least 15 steps denoising to get good results. + +### April 13, 2025: WanGP v4.0 +👋 Lots of goodies for you! +- A new UI, tabs were replaced by a Dropdown box to easily switch models +- A new queuing system that lets you stack in a queue as many text2video, image2video tasks, ... as you want. Each task can rely on complete different generation parameters (different number of frames, steps, loras, ...). Many thanks to **Tophness** for being a big contributor on this new feature +- Temporal upsampling (Rife) and spatial upsampling (Lanczos) for a smoother video (32 fps or 64 fps) and to enlarge your video by x2 or x4. Check these new advanced options. +- Wan Vace Control Net support: with Vace you can inject in the scene people or objects, animate a person, perform inpainting or outpainting, continue a video, ... See [VACE.md](VACE.md) for introduction guide. +- Integrated *Matanyone* tool directly inside WanGP so that you can create easily inpainting masks used in Vace +- Sliding Window generation for Vace, create windows that can last dozens of seconds +- New optimizations for old generation GPUs: Generate 5s (81 frames, 15 steps) of Vace 1.3B with only 5GB and in only 6 minutes on a RTX 2080Ti and 5s of t2v 14B in less than 10 minutes. + +### March 27, 2025 +👋 Added support for the new Wan Fun InP models (image2video). The 14B Fun InP has probably better end image support but unfortunately existing loras do not work so well with it. The great novelty is the Fun InP image2 1.3B model: Image 2 Video is now accessible to even lower hardware configuration. It is not as good as the 14B models but very impressive for its size. Many thanks to the VideoX-Fun team (https://github.com/aigc-apps/VideoX-Fun) + +### March 26, 2025 +👋 Good news! Official support for RTX 50xx please check the [installation instructions](INSTALLATION.md). + +### March 24, 2025: Wan2.1GP v3.2 +👋 +- Added Classifier-Free Guidance Zero Star. The video should match better the text prompt (especially with text2video) at no performance cost: many thanks to the **CFG Zero * Team**. Don't hesitate to give them a star if you appreciate the results: https://github.com/WeichenFan/CFG-Zero-star +- Added back support for PyTorch compilation with Loras. It seems it had been broken for some time +- Added possibility to keep a number of pregenerated videos in the Video Gallery (useful to compare outputs of different settings) + +*You will need one more `pip install -r requirements.txt`* + +### March 19, 2025: Wan2.1GP v3.1 +👋 Faster launch and RAM optimizations (should require less RAM to run) + +*You will need one more `pip install -r requirements.txt`* + +### March 18, 2025: Wan2.1GP v3.0 +👋 +- New Tab based interface, you can switch from i2v to t2v conversely without restarting the app +- Experimental Dual Frames mode for i2v, you can also specify an End frame. It doesn't always work, so you will need a few attempts. +- You can save default settings in the files *i2v_settings.json* and *t2v_settings.json* that will be used when launching the app (you can also specify the path to different settings files) +- Slight acceleration with loras + +*You will need one more `pip install -r requirements.txt`* + +Many thanks to *Tophness* who created the framework (and did a big part of the work) of the multitabs and saved settings features + +### March 18, 2025: Wan2.1GP v2.11 +👋 Added more command line parameters to prefill the generation settings + customizable output directory and choice of type of metadata for generated videos. Many thanks to *Tophness* for his contributions. + +*You will need one more `pip install -r requirements.txt` to reflect new dependencies* + +### March 18, 2025: Wan2.1GP v2.1 +👋 More Loras!: added support for 'Safetensors' and 'Replicate' Lora formats. + +*You will need to refresh the requirements with a `pip install -r requirements.txt`* + +### March 17, 2025: Wan2.1GP v2.0 +👋 The Lora festival continues: +- Clearer user interface +- Download 30 Loras in one click to try them all (expand the info section) +- Very easy to use Loras as now Lora presets can input the subject (or other needed terms) of the Lora so that you don't have to modify manually a prompt +- Added basic macro prompt language to prefill prompts with different values. With one prompt template, you can generate multiple prompts. +- New Multiple images prompts: you can now combine any number of images with any number of text prompts (need to launch the app with --multiple-images) +- New command lines options to launch directly the 1.3B t2v model or the 14B t2v model + +### March 14, 2025: Wan2.1GP v1.7 +👋 +- Lora Fest special edition: very fast loading/unload of loras for those Loras collectors around. You can also now add/remove loras in the Lora folder without restarting the app. +- Added experimental Skip Layer Guidance (advanced settings), that should improve the image quality at no extra cost. Many thanks to the *AmericanPresidentJimmyCarter* for the original implementation + +*You will need to refresh the requirements `pip install -r requirements.txt`* + +### March 13, 2025: Wan2.1GP v1.6 +👋 Better Loras support, accelerated loading Loras. + +*You will need to refresh the requirements `pip install -r requirements.txt`* + +### March 10, 2025: Wan2.1GP v1.5 +👋 Official Teacache support + Smart Teacache (find automatically best parameters for a requested speed multiplier), 10% speed boost with no quality loss, improved lora presets (they can now include prompts and comments to guide the user) + +### March 7, 2025: Wan2.1GP v1.4 +👋 Fix PyTorch compilation, now it is really 20% faster when activated + +### March 4, 2025: Wan2.1GP v1.3 +👋 Support for Image to Video with multiples images for different images/prompts combinations (requires *--multiple-images* switch), and added command line *--preload x* to preload in VRAM x MB of the main diffusion model if you find there is too much unused VRAM and you want to (slightly) accelerate the generation process. + +*If you upgrade you will need to do a `pip install -r requirements.txt` again.* + +### March 4, 2025: Wan2.1GP v1.2 +👋 Implemented tiling on VAE encoding and decoding. No more VRAM peaks at the beginning and at the end + +### March 3, 2025: Wan2.1GP v1.1 +👋 Added Tea Cache support for faster generations: optimization of kijai's implementation (https://github.com/kijai/ComfyUI-WanVideoWrapper/) of teacache (https://github.com/ali-vilab/TeaCache) + +### March 2, 2025: Wan2.1GP by DeepBeepMeep v1 +👋 Brings: +- Support for all Wan including the Image to Video model +- Reduced memory consumption by 2, with possibility to generate more than 10s of video at 720p with a RTX 4090 and 10s of video at 480p with less than 12GB of VRAM. Many thanks to REFLEx (https://github.com/thu-ml/RIFLEx) for their algorithm that allows generating nice looking video longer than 5s. +- The usual perks: web interface, multiple generations, loras support, sage attention, auto download of models, ... + +## Original Wan Releases + +### February 25, 2025 +👋 We've released the inference code and weights of Wan2.1. + +### February 27, 2025 +👋 Wan2.1 has been integrated into [ComfyUI](https://comfyanonymous.github.io/ComfyUI_examples/wan/). Enjoy! \ No newline at end of file diff --git a/docs/CLI.md b/docs/CLI.md new file mode 100644 index 0000000000000000000000000000000000000000..b905e54fd5e0865590c4c46fa45194553faadaf4 --- /dev/null +++ b/docs/CLI.md @@ -0,0 +1,226 @@ +--vace-1-3B--vace-1-3B# Command Line Reference + +This document covers all available command line options for WanGP. + +## Basic Usage + +```bash +# Default launch +python wgp.py + +# Specific model modes +python wgp.py --i2v # Image-to-video +python wgp.py --t2v # Text-to-video (default) +python wgp.py --t2v-14B # 14B text-to-video model +python wgp.py --t2v-1-3B # 1.3B text-to-video model +python wgp.py --i2v-14B # 14B image-to-video model +python wgp.py --i2v-1-3B # Fun InP 1.3B image-to-video model +python wgp.py --vace-1-3B # VACE ControlNet 1.3B model +``` + +## Model and Performance Options + +### Model Configuration +```bash +--quantize-transformer BOOL # Enable/disable transformer quantization (default: True) +--compile # Enable PyTorch compilation (requires Triton) +--attention MODE # Force attention mode: sdpa, flash, sage, sage2 +--profile NUMBER # Performance profile 1-5 (default: 4) +--preload NUMBER # Preload N MB of diffusion model in VRAM +--fp16 # Force fp16 instead of bf16 models +--gpu DEVICE # Run on specific GPU device (e.g., "cuda:1") +``` + +### Performance Profiles +- **Profile 1**: Load entire current model in VRAM and keep all unused models in reserved RAM for fast VRAM tranfers +- **Profile 2**: Load model parts as needed, keep all unused models in reserved RAM for fast VRAM tranfers +- **Profile 3**: Load entire current model in VRAM (requires 24GB for 14B model) +- **Profile 4**: Default and recommended, load model parts as needed, most flexible option +- **Profile 5**: Minimum RAM usage + +### Memory Management +```bash +--perc-reserved-mem-max FLOAT # Max percentage of RAM for reserved memory (< 0.5) +``` + +## Lora Configuration + +```bash +--lora-dir PATH # Path to Wan t2v loras directory +--lora-dir-i2v PATH # Path to Wan i2v loras directory +--lora-dir-hunyuan PATH # Path to Hunyuan t2v loras directory +--lora-dir-hunyuan-i2v PATH # Path to Hunyuan i2v loras directory +--lora-dir-ltxv PATH # Path to LTX Video loras directory +--lora-preset PRESET # Load lora preset file (.lset) on startup +--check-loras # Filter incompatible loras (slower startup) +``` + +## Generation Settings + +### Basic Generation +```bash +--seed NUMBER # Set default seed value +--frames NUMBER # Set default number of frames to generate +--steps NUMBER # Set default number of denoising steps +--advanced # Launch with advanced mode enabled +``` + +### Advanced Generation +```bash +--teacache MULTIPLIER # TeaCache speed multiplier: 0, 1.5, 1.75, 2.0, 2.25, 2.5 +``` + +## Interface and Server Options + +### Server Configuration +```bash +--server-port PORT # Gradio server port (default: 7860) +--server-name NAME # Gradio server name (default: localhost) +--listen # Make server accessible on network +--share # Create shareable HuggingFace URL for remote access +--open-browser # Open browser automatically when launching +``` + +### Interface Options +```bash +--lock-config # Prevent modifying video engine configuration from interface +--theme THEME_NAME # UI theme: "default" or "gradio" +``` + +## File and Directory Options + +```bash +--settings PATH # Path to folder containing default settings for all models +--verbose LEVEL # Information level 0-2 (default: 1) +``` + +## Examples + +### Basic Usage Examples +```bash +# Launch with specific model and loras +python wgp.py --t2v-14B --lora-preset mystyle.lset + +# High-performance setup with compilation +python wgp.py --compile --attention sage2 --profile 3 + +# Low VRAM setup +python wgp.py --t2v-1-3B --profile 4 --attention sdpa + +# Multiple images with custom lora directory +python wgp.py --i2v --multiple-images --lora-dir /path/to/shared/loras +``` + +### Server Configuration Examples +```bash +# Network accessible server +python wgp.py --listen --server-port 8080 + +# Shareable server with custom theme +python wgp.py --share --theme gradio --open-browser + +# Locked configuration for public use +python wgp.py --lock-config --share +``` + +### Advanced Performance Examples +```bash +# Maximum performance (requires high-end GPU) +python wgp.py --compile --attention sage2 --profile 3 --preload 2000 + +# Optimized for RTX 2080Ti +python wgp.py --profile 4 --attention sdpa --teacache 2.0 + +# Memory-efficient setup +python wgp.py --fp16 --profile 4 --perc-reserved-mem-max 0.3 +``` + +### TeaCache Configuration +```bash +# Different speed multipliers +python wgp.py --teacache 1.5 # 1.5x speed, minimal quality loss +python wgp.py --teacache 2.0 # 2x speed, some quality loss +python wgp.py --teacache 2.5 # 2.5x speed, noticeable quality loss +python wgp.py --teacache 0 # Disable TeaCache +``` + +## Attention Modes + +### SDPA (Default) +```bash +python wgp.py --attention sdpa +``` +- Available by default with PyTorch +- Good compatibility with all GPUs +- Moderate performance + +### Sage Attention +```bash +python wgp.py --attention sage +``` +- Requires Triton installation +- 30% faster than SDPA +- Small quality cost + +### Sage2 Attention +```bash +python wgp.py --attention sage2 +``` +- Requires Triton and SageAttention 2.x +- 40% faster than SDPA +- Best performance option + +### Flash Attention +```bash +python wgp.py --attention flash +``` +- May require CUDA kernel compilation +- Good performance +- Can be complex to install on Windows + +## Troubleshooting Command Lines + +### Fallback to Basic Setup +```bash +# If advanced features don't work +python wgp.py --attention sdpa --profile 4 --fp16 +``` + +### Debug Mode +```bash +# Maximum verbosity for troubleshooting +python wgp.py --verbose 2 --check-loras +``` + +### Memory Issue Debugging +```bash +# Minimal memory usage +python wgp.py --profile 4 --attention sdpa --perc-reserved-mem-max 0.2 +``` + + + +## Configuration Files + +### Settings Files +Load custom settings: +```bash +python wgp.py --settings /path/to/settings/folder +``` + +### Lora Presets +Create and share lora configurations: +```bash +# Load specific preset +python wgp.py --lora-preset anime_style.lset + +# With custom lora directory +python wgp.py --lora-preset mystyle.lset --lora-dir /shared/loras +``` + +## Environment Variables + +While not command line options, these environment variables can affect behavior: +- `CUDA_VISIBLE_DEVICES` - Limit visible GPUs +- `PYTORCH_CUDA_ALLOC_CONF` - CUDA memory allocation settings +- `TRITON_CACHE_DIR` - Triton cache directory (for Sage attention) \ No newline at end of file diff --git a/docs/GETTING_STARTED.md b/docs/GETTING_STARTED.md new file mode 100644 index 0000000000000000000000000000000000000000..0bd492ce0a7c35d4a55406ca9fa40942bbf5e972 --- /dev/null +++ b/docs/GETTING_STARTED.md @@ -0,0 +1,194 @@ +# Getting Started with WanGP + +This guide will help you get started with WanGP video generation quickly and easily. + +## Prerequisites + +Before starting, ensure you have: +- A compatible GPU (RTX 10XX or newer recommended) +- Python 3.10.9 installed +- At least 6GB of VRAM for basic models +- Internet connection for model downloads + +## Quick Setup + +### Option 1: One-Click Installation (Recommended) +Use [Pinokio App](https://pinokio.computer/) for the easiest installation experience. + +### Option 2: Manual Installation +```bash +git clone https://github.com/deepbeepmeep/Wan2GP.git +cd Wan2GP +conda create -n wan2gp python=3.10.9 +conda activate wan2gp +pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124 +pip install -r requirements.txt +``` + +For detailed installation instructions, see [INSTALLATION.md](INSTALLATION.md). + +## First Launch + +### Basic Launch +```bash +python wgp.py +``` +This launches the WanGP generator with default settings. You will be able to pick from a Drop Down menu which model you want to use. + +### Alternative Modes +```bash +python wgp.py --i2v # Wan Image-to-video mode +python wgp.py --t2v-1-3B # Wan Smaller, faster model +``` + +## Understanding the Interface + +When you launch WanGP, you'll see a web interface with several sections: + +### Main Generation Panel +- **Model Selection**: Dropdown to choose between different models +- **Prompt**: Text description of what you want to generate +- **Generate Button**: Start the video generation process + +### Advanced Settings (click checkbox to enable) +- **Generation Settings**: Steps, guidance, seeds +- **Loras**: Additional style customizations +- **Sliding Window**: For longer videos + +## Your First Video + +Let's generate a simple text-to-video: + +1. **Launch WanGP**: `python wgp.py` +2. **Open Browser**: Navigate to `http://localhost:7860` +3. **Enter Prompt**: "A cat walking in a garden" +4. **Click Generate**: Wait for the video to be created +5. **View Result**: The video will appear in the output section + +### Recommended First Settings +- **Model**: Wan 2.1 text2video 1.3B (faster, lower VRAM) +- **Frames**: 49 (about 2 seconds) +- **Steps**: 20 (good balance of speed/quality) + +## Model Selection + +### Text-to-Video Models +- **Wan 2.1 T2V 1.3B**: Fastest, lowest VRAM (6GB), good quality +- **Wan 2.1 T2V 14B**: Best quality, requires more VRAM (12GB+) +- **Hunyuan Video**: Excellent quality, slower generation +- **LTX Video**: Good for longer videos + +### Image-to-Video Models +- **Wan Fun InP 1.3B**: Fast image animation +- **Wan Fun InP 14B**: Higher quality image animation +- **VACE**: Advanced control over video generation + +### Choosing the Right Model +- **Low VRAM (6-8GB)**: Use 1.3B models +- **Medium VRAM (10-12GB)**: Use 14B models or Hunyuan +- **High VRAM (16GB+)**: Any model, longer videos + +## Basic Settings Explained + +### Generation Settings +- **Frames**: Number of frames (more = longer video) + - 25 frames ≈ 1 second + - 49 frames ≈ 2 seconds + - 73 frames ≈ 3 seconds + +- **Steps**: Quality vs Speed tradeoff + - 15 steps: Fast, lower quality + - 20 steps: Good balance + - 30+ steps: High quality, slower + +- **Guidance Scale**: How closely to follow the prompt + - 3-5: More creative interpretation + - 7-10: Closer to prompt description + - 12+: Very literal interpretation + +### Seeds +- **Random Seed**: Different result each time +- **Fixed Seed**: Reproducible results +- **Use same seed + prompt**: Generate variations + +## Common Beginner Issues + +### "Out of Memory" Errors +1. Use smaller models (1.3B instead of 14B) +2. Reduce frame count +3. Lower resolution in advanced settings +4. Enable quantization (usually on by default) + +### Slow Generation +1. Use 1.3B models for speed +2. Reduce number of steps +3. Install Sage attention (see [INSTALLATION.md](INSTALLATION.md)) +4. Enable TeaCache: `python wgp.py --teacache 2.0` + +### Poor Quality Results +1. Increase number of steps (25-30) +2. Improve prompt description +3. Use 14B models if you have enough VRAM +4. Enable Skip Layer Guidance in advanced settings + +## Writing Good Prompts + +### Basic Structure +``` +[Subject] [Action] [Setting] [Style/Quality modifiers] +``` + +### Examples +``` +A red sports car driving through a mountain road at sunset, cinematic, high quality + +A woman with long hair walking on a beach, waves in the background, realistic, detailed + +A cat sitting on a windowsill watching rain, cozy atmosphere, soft lighting +``` + +### Tips +- Be specific about what you want +- Include style descriptions (cinematic, realistic, etc.) +- Mention lighting and atmosphere +- Describe the setting in detail +- Use quality modifiers (high quality, detailed, etc.) + +## Next Steps + +Once you're comfortable with basic generation: + +1. **Explore Advanced Features**: + - [Loras Guide](LORAS.md) - Customize styles and characters + - [VACE ControlNet](VACE.md) - Advanced video control + - [Command Line Options](CLI.md) - Optimize performance + +2. **Improve Performance**: + - Install better attention mechanisms + - Optimize memory settings + - Use compilation for speed + +3. **Join the Community**: + - [Discord Server](https://discord.gg/g7efUW9jGV) - Get help and share videos + - Share your best results + - Learn from other users + +## Troubleshooting First Steps + +### Installation Issues +- Ensure Python 3.10.9 is used +- Check CUDA version compatibility +- See [INSTALLATION.md](INSTALLATION.md) for detailed steps + +### Generation Issues +- Check GPU compatibility +- Verify sufficient VRAM +- Try basic settings first +- See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for specific issues + +### Performance Issues +- Use appropriate model for your hardware +- Enable performance optimizations +- Check [CLI.md](CLI.md) for optimization flags + +Remember: Start simple and gradually explore more advanced features as you become comfortable with the basics! \ No newline at end of file diff --git a/docs/INSTALLATION.md b/docs/INSTALLATION.md new file mode 100644 index 0000000000000000000000000000000000000000..f076e7ed4839977214490a7df42d719a4da15cb5 --- /dev/null +++ b/docs/INSTALLATION.md @@ -0,0 +1,170 @@ +# Installation Guide + +This guide covers installation for different GPU generations and operating systems. + +## Requirements + +- Python 3.10.9 +- Conda or Python venv +- Compatible GPU (RTX 10XX or newer recommended) + +## Installation for RTX 10XX to RTX 40XX (Stable) + +This installation uses PyTorch 2.6.0 which is well-tested and stable. + +### Step 1: Download and Setup Environment + +```shell +# Clone the repository +git clone https://github.com/deepbeepmeep/Wan2GP.git +cd Wan2GP + +# Create Python 3.10.9 environment using conda +conda create -n wan2gp python=3.10.9 +conda activate wan2gp +``` + +### Step 2: Install PyTorch + +```shell +# Install PyTorch 2.6.0 with CUDA 12.4 +pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124 +``` + +### Step 3: Install Dependencies + +```shell +# Install core dependencies +pip install -r requirements.txt +``` + +### Step 4: Optional Performance Optimizations + +#### Sage Attention (30% faster) + +```shell +# Windows only: Install Triton +pip install triton-windows + +# For both Windows and Linux +pip install sageattention==1.0.6 +``` + +#### Sage 2 Attention (40% faster) + +```shell +# Windows +pip install triton-windows +pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu126torch2.6.0-cp310-cp310-win_amd64.whl + +# Linux (manual compilation required) +git clone https://github.com/thu-ml/SageAttention +cd SageAttention +pip install -e . +``` + +#### Flash Attention + +```shell +# May require CUDA kernel compilation on Windows +pip install flash-attn==2.7.2.post1 +``` + +## Installation for RTX 50XX (Beta) + +RTX 50XX GPUs require PyTorch 2.7.0 (beta). This version may be less stable. + +⚠️ **Important:** Use Python 3.10 for compatibility with pip wheels. + +### Step 1: Setup Environment + +```shell +# Clone and setup (same as above) +git clone https://github.com/deepbeepmeep/Wan2GP.git +cd Wan2GP +conda create -n wan2gp python=3.10.9 +conda activate wan2gp +``` + +### Step 2: Install PyTorch Beta + +```shell +# Install PyTorch 2.7.0 with CUDA 12.8 +pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128 +``` + +### Step 3: Install Dependencies + +```shell +pip install -r requirements.txt +``` + +### Step 4: Optional Optimizations for RTX 50XX + +#### Sage Attention + +```shell +# Windows +pip install triton-windows +pip install sageattention==1.0.6 + +# Linux +pip install sageattention==1.0.6 +``` + +#### Sage 2 Attention + +```shell +# Windows +pip install triton-windows +pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl + +# Linux (manual compilation) +git clone https://github.com/thu-ml/SageAttention +cd SageAttention +pip install -e . +``` + +## Attention Modes + +WanGP supports several attention implementations: + +- **SDPA** (default): Available by default with PyTorch +- **Sage**: 30% speed boost with small quality cost +- **Sage2**: 40% speed boost +- **Flash**: Good performance, may be complex to install on Windows + +## Performance Profiles + +Choose a profile based on your hardware: + +- **Profile 3 (LowRAM_HighVRAM)**: Loads entire model in VRAM, requires 24GB VRAM for 8-bit quantized 14B model +- **Profile 4 (LowRAM_LowVRAM)**: Default, loads model parts as needed, slower but lower VRAM requirement + +## Troubleshooting + +### Sage Attention Issues + +If Sage attention doesn't work: + +1. Check if Triton is properly installed +2. Clear Triton cache +3. Fallback to SDPA attention: + ```bash + python wgp.py --attention sdpa + ``` + +### Memory Issues + +- Use lower resolution or shorter videos +- Enable quantization (default) +- Use Profile 4 for lower VRAM usage +- Consider using 1.3B models instead of 14B models + +### GPU Compatibility + +- RTX 10XX, 20XX: Supported with SDPA attention +- RTX 30XX, 40XX: Full feature support +- RTX 50XX: Beta support with PyTorch 2.7.0 + +For more troubleshooting, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md) \ No newline at end of file diff --git a/docs/LORAS.md b/docs/LORAS.md new file mode 100644 index 0000000000000000000000000000000000000000..e911071b2c75631af647071319f2f083c4c862f8 --- /dev/null +++ b/docs/LORAS.md @@ -0,0 +1,224 @@ +# Loras Guide + +Loras (Low-Rank Adaptations) allow you to customize video generation models by adding specific styles, characters, or effects to your videos. + +## Directory Structure + +Loras are organized in different folders based on the model they're designed for: + +### Text-to-Video Models +- `loras/` - General t2v loras +- `loras/1.3B/` - Loras specifically for 1.3B models +- `loras/14B/` - Loras specifically for 14B models + +### Image-to-Video Models +- `loras_i2v/` - Image-to-video loras + +### Other Models +- `loras_hunyuan/` - Hunyuan Video t2v loras +- `loras_hunyuan_i2v/` - Hunyuan Video i2v loras +- `loras_ltxv/` - LTX Video loras + +## Custom Lora Directory + +You can specify custom lora directories when launching the app: + +```bash +# Use shared lora directory for both t2v and i2v +python wgp.py --lora-dir /path/to/shared/loras --lora-dir-i2v /path/to/shared/loras + +# Specify different directories for different models +python wgp.py --lora-dir-hunyuan /path/to/hunyuan/loras --lora-dir-ltxv /path/to/ltx/loras +``` + +## Using Loras + +### Basic Usage + +1. Place your lora files in the appropriate directory +2. Launch WanGP +3. In the Advanced Tab, select the "Loras" section +4. Check the loras you want to activate +5. Set multipliers for each lora (default is 1.0) + +### Lora Multipliers + +Multipliers control the strength of each lora's effect: + +#### Simple Multipliers +``` +1.2 0.8 +``` +- First lora: 1.2 strength +- Second lora: 0.8 strength + +#### Time-based Multipliers +For dynamic effects over generation steps, use comma-separated values: +``` +0.9,0.8,0.7 +1.2,1.1,1.0 +``` +- For 30 steps: steps 0-9 use first value, 10-19 use second, 20-29 use third +- First lora: 0.9 → 0.8 → 0.7 +- Second lora: 1.2 → 1.1 → 1.0 + +## Lora Presets + +Presets are combinations of loras with predefined multipliers and prompts. + +### Creating Presets +1. Configure your loras and multipliers +2. Write a prompt with comments (lines starting with #) +3. Save as a preset with `.lset` extension + +### Example Preset +``` +# Use the keyword "ohnvx" to trigger the lora +A ohnvx character is driving a car through the city +``` + +### Using Presets +```bash +# Load preset on startup +python wgp.py --lora-preset mypreset.lset +``` + +### Managing Presets +- Edit, save, or delete presets directly from the web interface +- Presets include comments with usage instructions +- Share `.lset` files with other users + +## CausVid Lora (Video Generation Accelerator) + +CausVid is a distilled Wan model that generates videos in 4-12 steps with 2x speed improvement. + +### Setup Instructions +1. Download the CausVid Lora: + ``` + https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors + ``` +2. Place in your `loras/` directory + +### Usage +1. Select a Wan t2v model (e.g., Wan 2.1 text2video 13B or Vace 13B) +2. Enable Advanced Mode +3. In Advanced Generation Tab: + - Set Guidance Scale = 1 + - Set Shift Scale = 7 +4. In Advanced Lora Tab: + - Select CausVid Lora + - Set multiplier to 0.3 +5. Set generation steps to 12 +6. Generate! + +### CausVid Step/Multiplier Relationship +- **12 steps**: 0.3 multiplier (recommended) +- **8 steps**: 0.5-0.7 multiplier +- **4 steps**: 0.8-1.0 multiplier + +*Note: Lower steps = lower quality (especially motion)* + +## Supported Formats + +WanGP supports multiple lora formats: +- **Safetensors** (.safetensors) +- **Replicate** format +- **Standard PyTorch** (.pt, .pth) + +## AccVid Lora (Video Generation Accelerator) + +AccVid is a distilled Wan model that generates videos with a 2x speed improvement since classifier free guidance is no longer needed (that is cfg = 1). + +### Setup Instructions +1. Download the CausVid Lora: + +- for t2v models: + ``` + https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_AccVid_T2V_14B_lora_rank32_fp16.safetensors + ``` + +- for i2v models: + ``` + https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_AccVid_I2V_480P_14B_lora_rank32_fp16.safetensors + ``` + +2. Place in your `loras/` directory or `loras_i2v/` directory + +### Usage +1. Select a Wan t2v model (e.g., Wan 2.1 text2video 13B or Vace 13B) or Wan i2v model +2. Enable Advanced Mode +3. In Advanced Generation Tab: + - Set Guidance Scale = 1 + - Set Shift Scale = 5 +4. The number steps remain unchanged compared to what you would use with the original model but it will be two times faster since classifier free guidance is not needed + +## Performance Tips + +### Fast Loading/Unloading +- Loras can be added/removed without restarting the app +- Use the "Refresh" button to detect new loras +- Enable `--check-loras` to filter incompatible loras (slower startup) + +### Memory Management +- Loras are loaded on-demand to save VRAM +- Multiple loras can be used simultaneously +- Time-based multipliers don't use extra memory + +## Finding Loras + +### Sources +- **[Civitai](https://civitai.com/)** - Large community collection +- **HuggingFace** - Official and community loras +- **Discord Server** - Community recommendations + +### Creating Loras +- **Kohya** - Popular training tool +- **OneTrainer** - Alternative training solution +- **Custom datasets** - Train on your own content + +## Macro System (Advanced) + +Create multiple prompts from templates using macros: + +``` +! {Subject}="cat","woman","man", {Location}="forest","lake","city", {Possessive}="its","her","his" +In the video, a {Subject} is presented. The {Subject} is in a {Location} and looks at {Possessive} watch. +``` + +This generates: +1. "In the video, a cat is presented. The cat is in a forest and looks at its watch." +2. "In the video, a woman is presented. The woman is in a lake and looks at her watch." +3. "In the video, a man is presented. The man is in a city and looks at his watch." + +## Troubleshooting + +### Lora Not Working +1. Check if lora is compatible with your model size (1.3B vs 14B) +2. Verify lora format is supported +3. Try different multiplier values +4. Check the lora was trained for your model type (t2v vs i2v) + +### Performance Issues +1. Reduce number of active loras +2. Lower multiplier values +3. Use `--check-loras` to filter incompatible files +4. Clear lora cache if issues persist + +### Memory Errors +1. Use fewer loras simultaneously +2. Reduce model size (use 1.3B instead of 14B) +3. Lower video resolution or frame count +4. Enable quantization if not already active + +## Command Line Options + +```bash +# Lora-related command line options +--lora-dir path # Path to t2v loras directory +--lora-dir-i2v path # Path to i2v loras directory +--lora-dir-hunyuan path # Path to Hunyuan t2v loras +--lora-dir-hunyuan-i2v path # Path to Hunyuan i2v loras +--lora-dir-ltxv path # Path to LTX Video loras +--lora-preset preset # Load preset on startup +--check-loras # Filter incompatible loras +``` \ No newline at end of file diff --git a/docs/MODELS.md b/docs/MODELS.md new file mode 100644 index 0000000000000000000000000000000000000000..1fd2e83e22824183607de684b9586141ead647d8 --- /dev/null +++ b/docs/MODELS.md @@ -0,0 +1,268 @@ +# Models Overview + +WanGP supports multiple video generation models, each optimized for different use cases and hardware configurations. + + +## Wan 2.1 Text2Video Models +Please note that that the term *Text2Video* refers to the underlying Wan architecture but as it has been greatly improved overtime many derived Text2Video models can now generate videos using images. + +#### Wan 2.1 Text2Video 1.3B +- **Size**: 1.3 billion parameters +- **VRAM**: 6GB minimum +- **Speed**: Fast generation +- **Quality**: Good quality for the size +- **Best for**: Quick iterations, lower-end hardware +- **Command**: `python wgp.py --t2v-1-3B` + +#### Wan 2.1 Text2Video 14B +- **Size**: 14 billion parameters +- **VRAM**: 12GB+ recommended +- **Speed**: Slower but higher quality +- **Quality**: Excellent detail and coherence +- **Best for**: Final production videos +- **Command**: `python wgp.py --t2v-14B` + +#### Wan Vace 1.3B +- **Type**: ControlNet for advanced video control +- **VRAM**: 6GB minimum +- **Features**: Motion transfer, object injection, inpainting +- **Best for**: Advanced video manipulation +- **Command**: `python wgp.py --vace-1.3B` + +#### Wan Vace 14B +- **Type**: Large ControlNet model +- **VRAM**: 12GB+ recommended +- **Features**: All Vace features with higher quality +- **Best for**: Professional video editing workflows + +#### MoviiGen (Experimental) +- **Resolution**: Claims 1080p capability +- **VRAM**: 20GB+ required +- **Speed**: Very slow generation +- **Features**: Should generate cinema like video, specialized for 2.1 / 1 ratios +- **Status**: Experimental, feedback welcome + +" + prompt + " | " + thumbnails + "