File size: 11,345 Bytes
a176955
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
title: EditP23
emoji: 🎨
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: false
---

# EditP23: 3D Editing via Propagation of Image Prompts to Multi-View

[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://editp23.github.io/)
[![arXiv](https://img.shields.io/badge/arXiv-2506.20652-b31b1b.svg)](https://arxiv.org/abs/2506.20652)

This repository contains the official implementation for **EditP23**, a method for fast, mask-free 3D editing that propagates 2D image edits to multi-view representations in a 3D-consistent manner.
The edit is guided by an image pair, allowing users to leverage any preferred 2D editing tool, from manual painting to generative pipelines.

### Installation
<details>
<summary>Click to expand installation instructions</summary>

This project was tested on a Linux system with Python 3.11 and CUDA 12.6.

**1. Clone the Repository**
```bash
git clone --recurse-submodules https://github.com/editp23/EditP23.git
cd EditP23
```

**2. Install Dependencies**
```bash
conda create -n editp23 python=3.11 -y
conda activate editp23
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 # Ensure compatibility with your CUDA version. (tested with torch 2.6, cuda 12.6)
pip install diffusers==0.30.1 transformers accelerate pillow huggingface_hub numpy tqdm
```

</details>

### Quick Start

**1. Prepare Your Experiment Directory**

Create a directory for your experiment. Inside this directory, you must place three specific PNG files:

* `src.png`: The original, unedited view of your object.
* `edited.png`: The same view after you have applied your desired 2D edit.
* `src_mv.png`: The multi-view grid of the original object, which will be edited.

Your directory structure should look like this:
```text
examples/
└── robot_sunglasses/
    β”œβ”€β”€ src.png
    β”œβ”€β”€ edited.png
    └── src_mv.png
```

**2. Run the Editing Script**

Execute the `main.py` script, pointing it to your experiment directory. You can adjust the guidance parameters based on the complexity of your edit.

#### Execution Examples

* **Mild Edit (Appearance Change):**
    ```bash
    python src/main.py --exp_dir examples/robot_sunglasses --tar_guidance_scale 5.0 --n_max 31
    ```
* **Hard Edit (Large Geometry Change):**
    ```bash
    python src/main.py --exp_dir examples/deer_wings --tar_guidance_scale 21.0 --n_max 39
    ```

The output will be saved in the `output/` subdirectory within your experiment folder.

### Command-Line Arguments

* `--exp_dir`: (Required) Path to the experiment directory.
* `--T_steps`: Total number of denoising steps. Default: `50`.
* `--n_max`: The number of denoising steps to apply edit-aware guidance. Higher values can help with more complex edits. Default: `31`. This value shouldn't exceed `T_steps`.
* `--src_guidance_scale`: CFG scale for the source condition. Can typically remain constant. Default: `3.5`.
* `--tar_guidance_scale`: CFG scale for the target (edited) condition. Higher values apply the edit more strongly. Default: `5.0`.
* `--seed`: Random seed for reproducibility. Default: `18`.


# Results in Multi-View

### Deer - Pixar style & Wings

| |                            Cond. View                             |                        View 1                        |                        View 2                        |                        View 3                        |
| :--- |:-----------------------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|
| **Original** | ![Original Condition View](resources/mv-gallery/1/src/prompt.png) | ![Original View 1](resources/mv-gallery/1/src/0.png) | ![Original View 2](resources/mv-gallery/1/src/1.png) | ![Original View 3](resources/mv-gallery/1/src/2.png) |
| **Pixar style** |  ![Pixar Condition View](resources/mv-gallery/1/edit/prompt.png)  |  ![Pixar View 1](resources/mv-gallery/1/edit/0.png)  |  ![Pixar View 2](resources/mv-gallery/1/edit/1.png)  |  ![Pixar View 3](resources/mv-gallery/1/edit/2.png)  |
| **Wings** | ![Wings Condition View](resources/mv-gallery/1/edit2/prompt.png)  | ![Wings View 2](resources/mv-gallery/1/edit2/0.png)  | ![Wings View 2](resources/mv-gallery/1/edit2/1.png)  | ![Wings View 3](resources/mv-gallery/1/edit2/2.png)  |

<br>

### Person - Old & Zombie

|              |                            Cond. View                             |                        View 1                        |                        View 2                        |                        View 3                        |
|:-------------|:-----------------------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|
| **Original** | ![Original Condition View](resources/mv-gallery/2/src/prompt.png) | ![Original View 1](resources/mv-gallery/2/src/0.png) | ![Original View 2](resources/mv-gallery/2/src/1.png) | ![Original View 3](resources/mv-gallery/2/src/2.png) |
| **Old**      |  ![Old Condition View](resources/mv-gallery/2/edit/prompt.png)  |  ![Old View 1](resources/mv-gallery/2/edit/0.png)  |  ![Old View 2](resources/mv-gallery/2/edit/1.png)  |  ![Old View 3](resources/mv-gallery/2/edit/2.png)  |
| **Zombie**   | ![Zombie Condition View](resources/mv-gallery/2/edit2/prompt.png)  | ![Zombie View 2](resources/mv-gallery/2/edit2/0.png)  | ![Zombie View 2](resources/mv-gallery/2/edit2/1.png)  | ![Zombie View 3](resources/mv-gallery/2/edit2/2.png)  |


# Project Structure
The repository is organized as follows:
```text
EditP23/
β”œβ”€β”€ examples/              # Example assets for quick testing
β”‚   β”œβ”€β”€ deer_wings/
β”‚   β”‚   β”œβ”€β”€ src.png
β”‚   β”‚   β”œβ”€β”€ edited.png
β”‚   β”‚   └── src_mv.png
β”‚   └── robot_sunglasses/
β”‚       └── ...
β”œβ”€β”€ assets/                # Raw asset files
β”‚   └── stormtrooper.glb
β”œβ”€β”€ scripts/               # Helper scripts for data preparation
β”‚   β”œβ”€β”€ render_mesh.py
β”‚   └── img2mv.py
β”œβ”€β”€ src/                   # Main source code
β”‚   β”œβ”€β”€ init.py
β”‚   β”œβ”€β”€ edit_mv.py
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ pipeline.py
β”‚   └── utils.py
β”œβ”€β”€ .gitignore
└── README.md
```

# Utilities

## Setup

This guide shows how to prepare inputs for **EditP23** and run an edit.

These helper scripts create the three PNG files every experiment needs:

| File          | Purpose                                                         |
|---------------|-----------------------------------------------------------------|
| `src.png`     | Original single view (the one you will edit).                   |
| `edited.png`  | Your 2D edit of `src.png`.                                      |
| `src_mv.png`  | 6-view grid of the original object.      |

### 1. Generate `src.png` and `src_mv.png`
**EditP23** needs a **source view** (`src.png`) and a **multi-view grid** (`src_mv.png`).  
The grid contains six extra views at fixed azimuth/elevation pairs:
Angles (azimuth, elevation): `(30Β°, 20Β°) (90Β°, -10Β°) (150Β°, 20Β°) (210Β°, -10Β°) (270Β°, 20Β°) (330Β°, -10Β°)` and for the prompt view `(0Β°, 20Β°)`.
We provide two methods to generate these inputs. Both methods produce views on a clean, white background.
Both methods below produce the multi-view grid and the source view from the relevant angles on a white background.

#### Method A: From a Single Image

You can generate the multi-view grid from a single image of an object using our `img2mv.py` script. This script leverages the Zero123++ pipeline with a checkpoint from InstantMesh, which is fine-tuned to produce white backgrounds.

```bash
# This script takes a single input image and generates the corresponding multi-view grid.
python scripts/img2mv.py \
  --input_image "examples/robot_sunglasses/src.png" \
  --output_dir "examples/robot_sunglasses/"
```
**Note:** In this case, `src.png` serves as the source view for EditP23.



#### Method B: From a 3D Mesh
If you have a 3D model, you can use our Blender script to render both the source view and the multi-view grid.
**Prerequisite:** This script requires Blender (`pip install bpy`).

```bash
# This script renders a source view and a multi-view grid from a 3D mesh.
python scripts/render_mesh.py \
  --mesh_path "assets/stormtrooper.glb" \
  --output_dir "examples/stormtrooper/"
```

### 2. Generating `edited.png`
Once you have your **source view**, you can use any 2D image editor to make your desired changes. We use this user-provided edit to guide the 3D modification.
For quick edits, you can use readily available online tools, such as the following HuggingFace Spaces:
- [FlowEdit](https://huggingface.co/spaces/fallenshock/FlowEdit): Excellent for global, structural edits.
- [Flux-Inpainting](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Fill-dev): Great for local modifications and inpainting.


## Reconstruction
After generating an edited multi-view image (`edited_mv.png`) with our main script, you can reconstruct it into a 3D model. We provide a helper script that uses the [InstantMesh](https://github.com/TencentARC/InstantMesh) framework to produce a textured `.obj` file and a turntable video.


### Additional Dependencies
First, you'll need to install several libraries required for the reconstruction process.

<details>
<summary>Click to expand installation instructions</summary>

```bash
# Install general dependencies
pip install opencv-python einops xatlas imageio[ffmpeg]

# Install NVIDIA's nvdiffrast library
pip install git+https://github.com/NVlabs/nvdiffrast/

# For video export, ensure ffmpeg is installed
# On conda, you can run:
conda install ffmpeg
```
</details>

### Running the Reconstruction
The reconstruction script takes the multi-view PNG as input and generates the 3D assets. The necessary model config file (instant-mesh-large.yaml) is included in the configs/ directory of the InstanMesh repository.
#### Example Command
````bash
python scripts/recon.py \
  external/instant-mesh/configs/instant-mesh-large.yaml \
  --input_file "examples/robot_sunglasses/output/edited_mv.png" \
  --output_dir "examples/robot_sunglasses/output/recon/"
  ````

### Command-Line Arguments
Here are the arguments for the recon.py script:

| Argument      | Description                                                        | Default      |
| :------------ | :----------------------------------------------------------------- | :----------- |
| `config`      | **(Required)** Path to the InstantMesh model config file.          |              |
| `--input_file`| **(Required)** Path to the multi-view PNG file you want to reconstruct. |              |
| `--output_dir`| Directory where the output `.obj` and `.mp4` files will be saved.  | `"outputs/"` |
| `--scale`     | Scale of the input cameras.                                        | `1.0`        |
| `--distance`  | Camera distance for rendering the output video.                    | `4.5`        |
| `--no_video`  | A flag to disable saving the `.mp4` video.                         | `False`      |