YoadTew commited on
Commit
12af8b2
·
1 Parent(s): 69732f0

Update readme

Browse files
Files changed (1) hide show
  1. README.md +10 -168
README.md CHANGED
@@ -1,171 +1,13 @@
1
- # 🎨 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
2
-
3
- <div align="center">
4
-
5
- [![arXiv](https://img.shields.io/badge/arXiv-2411.07232-b31b1b.svg)](https://arxiv.org/abs/2411.07232)
6
- [![Project Website](https://img.shields.io/badge/🌐-Project%20Website-blue)](https://research.nvidia.com/labs/par/addit/)
7
-
8
- </div>
9
-
10
- ## 👥 Authors
11
-
12
- **Yoad Tewel**<sup>1,2</sup>, **Rinon Gal**<sup>1,2</sup>, **Dvir Samuel**<sup>3</sup>, **Yuval Atzmon**<sup>1</sup>, **Lior Wolf**<sup>2</sup>, **Gal Chechik**<sup>1</sup>
13
-
14
- <sup>1</sup>NVIDIA • <sup>2</sup>Tel Aviv University • <sup>3</sup>Bar-Ilan University
15
-
16
- <div align="center">
17
- <img src="https://research.nvidia.com/labs/par/addit/static/images/Teaser.png" alt="Add-it Teaser" width="800"/>
18
- </div>
19
-
20
- ## 📄 Abstract
21
-
22
- Adding objects into images based on text instructions is a challenging task in semantic image editing, requiring a balance between preserving the original scene and seamlessly integrating the new object in a fitting location. Despite extensive efforts, existing models often struggle with this balance, particularly with finding a natural location for adding an object in complex scenes.
23
-
24
- We introduce **Add-it**, a training-free approach that extends diffusion models' attention mechanisms to incorporate information from three key sources: the scene image, the text prompt, and the generated image itself. Our weighted extended-attention mechanism maintains structural consistency and fine details while ensuring natural object placement.
25
-
26
- Without task-specific fine-tuning, Add-it achieves state-of-the-art results on both real and generated image insertion benchmarks, including our newly constructed "Additing Affordance Benchmark" for evaluating object placement plausibility, outperforming supervised methods. Human evaluations show that Add-it is preferred in over 80% of cases, and it also demonstrates improvements in various automated metrics.
27
-
28
- ---
29
-
30
- ## 📋 Description
31
-
32
- This repository contains the official implementation of the Add-it paper, providing tools for seamless object insertion into images using pretrained diffusion models.
33
-
34
- ---
35
-
36
- ## 🛠️ Setup
37
-
38
- ```bash
39
- conda env create -f environment.yml
40
- conda activate addit
41
- ```
42
-
43
  ---
44
-
45
- ## 🚀 Usage
46
-
47
- ### 💻 Command Line Interface (CLI)
48
-
49
- Add-it provides two CLI scripts for different use cases:
50
-
51
- #### 1. 🎭 Adding Objects to Generated Images
52
-
53
- Use `run_CLI_addit_generated.py` to add objects to AI-generated images:
54
-
55
- ```bash
56
- python run_CLI_addit_generated.py \
57
- --prompt_source "A photo of a cat sitting on the couch" \
58
- --prompt_target "A photo of a cat wearing a red hat sitting on the couch" \
59
- --subject_token "hat"
60
- ```
61
-
62
- ##### ⚙️ Options for Generated Images
63
-
64
- **🔴 Required Arguments:**
65
- - `--prompt_source`: Source prompt for generating the base image
66
- - `--prompt_target`: Target prompt describing the desired edited image
67
- - `--subject_token`: Single token representing the subject to add (must appear in prompt_target)
68
-
69
- **🔵 Optional Arguments:**
70
- - `--output_dir`: Directory to save output images (default: "outputs")
71
- - `--seed_src`: Seed for source generation (default: 6311)
72
- - `--seed_obj`: Seed for edited image generation (default: 1)
73
- - `--extended_scale`: Extended attention scale (default: 1.05)
74
- - `--structure_transfer_step`: Structure transfer step (default: 2)
75
- - `--blend_steps`: Blend steps (default: [15]). To allow for changes in the input image pass `--blend_steps` with empty value.
76
- - `--localization_model`: Localization model (default: "attention_points_sam")
77
- - **Options:** `attention_points_sam`, `attention`, `attention_box_sam`, `attention_mask_sam`, `grounding_sam`
78
- - `--show_attention`: Show attention maps using pyplot (flag), will be saved to `attn_vis.png`.
79
-
80
- #### 2. 📸 Adding Objects to Real Images
81
-
82
- Use `run_CLI_addit_real.py` to add objects to existing images:
83
-
84
- ```bash
85
- python run_CLI_addit_real.py \
86
- --source_image "images/bed_dark_room.jpg" \
87
- --prompt_source "A photo of a bed in a dark room" \
88
- --prompt_target "A photo of a dog lying on a bed in a dark room" \
89
- --subject_token "dog"
90
- ```
91
-
92
- ##### ⚙️ Options for Real Images
93
-
94
- **🔴 Required Arguments:**
95
- - `--source_image`: Path to the source image (default: "images/bed_dark_room.jpg")
96
- - `--prompt_source`: Source prompt describing the original image
97
- - `--prompt_target`: Target prompt describing the desired edited image
98
- - `--subject_token`: Subject token to add to the image (must appear in prompt_target)
99
-
100
- **🔵 Optional Arguments:**
101
- - `--output_dir`: Directory to save output images (default: "outputs")
102
- - `--seed_src`: Seed for source generation (default: 6311)
103
- - `--seed_obj`: Seed for edited image generation (default: 1)
104
- - `--extended_scale`: Extended attention scale (default: 1.1)
105
- - `--structure_transfer_step`: Structure transfer step (default: 4)
106
- - `--blend_steps`: Blend steps (default: [18]). To allow for changes in the input image pass `--blend_steps` with empty value.
107
- - `--localization_model`: Localization model (default: "attention")
108
- - **Options:** `attention_points_sam`, `attention`, `attention_box_sam`, `attention_mask_sam`, `grounding_sam`
109
- - `--use_offset`: Use offset in processing (flag)
110
- - `--show_attention`: Show attention maps using pyplot (flag), will be saved to `attn_vis.png`.
111
- - `--disable_inversion`: Disable source image inversion (flag)
112
-
113
- ---
114
-
115
- ### 📓 Jupyter Notebooks
116
-
117
- You can run Add-it in two interactive modes:
118
-
119
- | Mode | Notebook | Description |
120
- |------|----------|-------------|
121
- | 🎭 **Generated Images** | `run_addit_generated.ipynb` | Adding objects to AI-generated images |
122
- | 📸 **Real Images** | `run_addit_real.ipynb` | Adding objects to existing real images |
123
-
124
- The notebooks contain examples of different prompts and parameters that can be adjusted to control the object insertion process.
125
-
126
- ---
127
-
128
- ## 💡 Tips for Better Results
129
-
130
- - **Prompt Design**: The `--prompt_target` should be similar to the `--prompt_source`, but include a description of the new object to insert
131
- - **Seed Variation**: Try different values for `--seed_obj` - some prompts may require a few attempts to get satisfying results
132
- - **Localization Models**: The most effective `--localization_model` options are `attention_points_sam` and `attention`. Use the `--show_attention` flag to visualize localization performance
133
- - **Object Placement Issues**: If the object is not added to the image:
134
- - Try **decreasing** `--structure_transfer_step`
135
- - Try **increasing** `--extended_scale`
136
- - **Flexibility**: To allow more flexibility in modifying the source image, set `--blend_steps` to an empty value to send an empty list: `[]`
137
-
138
- ---
139
-
140
- ## 📰 News
141
-
142
- - **🎉 2025 JUL**: Official Add-it implementation is released!
143
-
144
- ---
145
-
146
- ## 📝 TODO
147
-
148
- - [x] Release code
149
-
150
- ---
151
-
152
- ## 📚 Citation
153
-
154
- If you make use of our work, please cite our paper:
155
-
156
- ```bibtex
157
- @misc{tewel2024addit,
158
- title={Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models},
159
- author={Yoad Tewel and Rinon Gal and Dvir Samuel and Yuval Atzmon and Lior Wolf and Gal Chechik},
160
- year={2024},
161
- eprint={2411.07232},
162
- archivePrefix={arXiv},
163
- primaryClass={cs.CV}
164
- }
165
- ```
166
-
167
  ---
168
 
169
- <div align="center">
170
- <strong>🌟 Star this repo if you find it useful! 🌟</strong>
171
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Addit
3
+ emoji:
4
+ colorFrom: pink
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: 5.36.2
8
+ app_file: app.py
9
+ pinned: false
10
+ license: other
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference