README.md · Alibaba-Research-Intelligence-Computing/wan-toy-transform at main

metadata

license: mit
language:
  - en
base_model:
  - Wan-AI/Wan2.1-I2V-14B-480P
  - Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
pipeline_tag: image-to-video
library_name: diffusers
tags:
  - AIGC
  - LoRA
  - adapter

Please refer to our github for more info: https://github.com/alibaba/wan-toy-transform

Wan Toy Transform

Alibaba Research Intelligence Computing

This is a LoRA model finetuned on Wan-I2V-14B-480P. It turns things in the image into fluffy toys.

🐍 Installation

# Python 3.12 and PyTorch 2.6.0 are tested.
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

🔄 Inference

python generate.py --prompt "The video opens with a clear view of a $name. Then it transforms to a b6e9636 JellyCat-style $name. It has a face and a cute, fluffy and playful appearance." --image $image_path --save_file "output.mp4" --offload_type leaf_level

Note:

Change $name to the object name you want to transform.
$image_path is the path to the first frame image.
Choose --offload_type from ['leaf_level', 'block_level', 'none', 'model']. More details can be found here.
VRAM usage and generation time of different --offload_type are listed below.

--offload_type VRAM Usage Generation Time (NVIDIA A100)

leaf_level 11.9 GB 17m17s

block_level (num_blocks_per_group=1) 20.5 GB 16m48s

model 39.4 GB 16m24s

none 55.9 GB 16m08s

`--offload_type`	VRAM Usage	Generation Time (NVIDIA A100)
leaf_level	11.9 GB	17m17s
block_level (num_blocks_per_group=1)	20.5 GB	16m48s
model	39.4 GB	16m24s
none	55.9 GB	16m08s

🤝 Acknowledgements

Special thanks to these projects for their contributions to the community!

Alibaba-Research-Intelligence-Computing
/

wan-toy-transform

Wan Toy Transform

🐍 Installation

🔄 Inference

🤝 Acknowledgements

📄 Our previous work