File size: 5,350 Bytes
7bc5051 759cfd3 7bc5051 759cfd3 7bc5051 759cfd3 7bc5051 759cfd3 7bc5051 759cfd3 7bc5051 759cfd3 7bc5051 759cfd3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
# FramePack Image Edit Early Lora
This repository contains the necessary steps and scripts to generate A edit of the Image using a image-to-video model.
The model leverages LoRA (Low-Rank Adaptation) weights and pre-trained components to create Edit Image based on a input Image and textual prompts.
## Prerequisites
Before proceeding, ensure that you have the following installed on your system:
• **Ubuntu** (or a compatible Linux distribution)
• **Python 3.x**
• **pip** (Python package manager)
• **Git**
• **Git LFS** (Git Large File Storage)
• **FFmpeg**
## Installation
1. **Update and Install Dependencies**
```bash
sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg
```
2. **Clone the Repository**
```bash
git clone https://huggingface.co/svjack/FramePack_Image_Edit_Lora_Early
cd FramePack_Image_Edit_Lora_Early
```
3. **Install Python Dependencies**
```bash
pip install torch torchvision
pip install -r requirements.txt
pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
pip install moviepy==1.0.3
pip install sageattention==1.0.6
```
4. **Download Model Weights**
```bash
git clone https://huggingface.co/lllyasviel/FramePackI2V_HY
git clone https://huggingface.co/hunyuanvideo-community/HunyuanVideo
git clone https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged
git clone https://huggingface.co/Comfy-Org/sigclip_vision_384
```
## Usage
To Edit a Image, use the `fpack_generate_video.py` script with the appropriate parameters. Below are examples of how to do it.
Make sure use 512x512 as output (this used to train it.)
* 1 Add a cat
- Input

```python
python fpack_generate_video.py \
--dit FramePackI2V_HY/diffusion_pytorch_model-00001-of-00003.safetensors \
--vae HunyuanVideo/vae/diffusion_pytorch_model.safetensors \
--text_encoder1 HunyuanVideo_repackaged/split_files/text_encoders/llava_llama3_fp16.safetensors \
--text_encoder2 HunyuanVideo_repackaged/split_files/text_encoders/clip_l.safetensors \
--image_encoder sigclip_vision_384/sigclip_vision_patch14_384.safetensors \
--image_path xiang_image.jpg \
--prompt "add a cat into the picture" \
--video_size 512 512 --fps 30 --infer_steps 25 \
--attn_mode sdpa --fp8_scaled \
--vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 \
--save_path save --video_sections 1 --output_type latent_images --one_frame_inference zero_post \
--seed 1234 --lora_multiplier 1.0 --lora_weight framepack_edit_output/framepack-edit-lora-000005.safetensors
```
- Output

* 2 Change Background
- Input

```python
python fpack_generate_video.py \
--dit FramePackI2V_HY/diffusion_pytorch_model-00001-of-00003.safetensors \
--vae HunyuanVideo/vae/diffusion_pytorch_model.safetensors \
--text_encoder1 HunyuanVideo_repackaged/split_files/text_encoders/llava_llama3_fp16.safetensors \
--text_encoder2 HunyuanVideo_repackaged/split_files/text_encoders/clip_l.safetensors \
--image_encoder sigclip_vision_384/sigclip_vision_patch14_384.safetensors \
--image_path wanye.jpg \
--prompt "Change the background into a restaurant in anime style. Keep the character's eye colors and white hair unchanged." \
--video_size 512 512 --fps 30 --infer_steps 25 \
--attn_mode sdpa --fp8_scaled \
--vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 \
--save_path save --video_sections 1 --output_type latent_images --one_frame_inference zero_post \
--seed 1234 --lora_multiplier 1.0 --lora_weight framepack_edit_output/framepack-edit-lora-000005.safetensors
```
- Output

* 3 Place Train into landscape
- Input

```python
python fpack_generate_video.py \
--dit FramePackI2V_HY/diffusion_pytorch_model-00001-of-00003.safetensors \
--vae HunyuanVideo/vae/diffusion_pytorch_model.safetensors \
--text_encoder1 HunyuanVideo_repackaged/split_files/text_encoders/llava_llama3_fp16.safetensors \
--text_encoder2 HunyuanVideo_repackaged/split_files/text_encoders/clip_l.safetensors \
--image_encoder sigclip_vision_384/sigclip_vision_patch14_384.safetensors \
--image_path train.jpg \
--prompt "place the train into a beautiful landscape" \
--video_size 512 512 --fps 30 --infer_steps 25 \
--attn_mode sdpa --fp8_scaled \
--vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 \
--save_path save --video_sections 1 --output_type latent_images --one_frame_inference zero_post \
--seed 1234 --lora_multiplier 1.0 --lora_weight framepack_edit_output/framepack-edit-lora-000005.safetensors
```
- Output

|