File size: 11,345 Bytes
a176955 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
---
title: EditP23
emoji: π¨
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: false
---
# EditP23: 3D Editing via Propagation of Image Prompts to Multi-View
[](https://editp23.github.io/)
[](https://arxiv.org/abs/2506.20652)
This repository contains the official implementation for **EditP23**, a method for fast, mask-free 3D editing that propagates 2D image edits to multi-view representations in a 3D-consistent manner.
The edit is guided by an image pair, allowing users to leverage any preferred 2D editing tool, from manual painting to generative pipelines.
### Installation
<details>
<summary>Click to expand installation instructions</summary>
This project was tested on a Linux system with Python 3.11 and CUDA 12.6.
**1. Clone the Repository**
```bash
git clone --recurse-submodules https://github.com/editp23/EditP23.git
cd EditP23
```
**2. Install Dependencies**
```bash
conda create -n editp23 python=3.11 -y
conda activate editp23
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 # Ensure compatibility with your CUDA version. (tested with torch 2.6, cuda 12.6)
pip install diffusers==0.30.1 transformers accelerate pillow huggingface_hub numpy tqdm
```
</details>
### Quick Start
**1. Prepare Your Experiment Directory**
Create a directory for your experiment. Inside this directory, you must place three specific PNG files:
* `src.png`: The original, unedited view of your object.
* `edited.png`: The same view after you have applied your desired 2D edit.
* `src_mv.png`: The multi-view grid of the original object, which will be edited.
Your directory structure should look like this:
```text
examples/
βββ robot_sunglasses/
βββ src.png
βββ edited.png
βββ src_mv.png
```
**2. Run the Editing Script**
Execute the `main.py` script, pointing it to your experiment directory. You can adjust the guidance parameters based on the complexity of your edit.
#### Execution Examples
* **Mild Edit (Appearance Change):**
```bash
python src/main.py --exp_dir examples/robot_sunglasses --tar_guidance_scale 5.0 --n_max 31
```
* **Hard Edit (Large Geometry Change):**
```bash
python src/main.py --exp_dir examples/deer_wings --tar_guidance_scale 21.0 --n_max 39
```
The output will be saved in the `output/` subdirectory within your experiment folder.
### Command-Line Arguments
* `--exp_dir`: (Required) Path to the experiment directory.
* `--T_steps`: Total number of denoising steps. Default: `50`.
* `--n_max`: The number of denoising steps to apply edit-aware guidance. Higher values can help with more complex edits. Default: `31`. This value shouldn't exceed `T_steps`.
* `--src_guidance_scale`: CFG scale for the source condition. Can typically remain constant. Default: `3.5`.
* `--tar_guidance_scale`: CFG scale for the target (edited) condition. Higher values apply the edit more strongly. Default: `5.0`.
* `--seed`: Random seed for reproducibility. Default: `18`.
# Results in Multi-View
### Deer - Pixar style & Wings
| | Cond. View | View 1 | View 2 | View 3 |
| :--- |:-----------------------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|
| **Original** |  |  |  |  |
| **Pixar style** |  |  |  |  |
| **Wings** |  |  |  |  |
<br>
### Person - Old & Zombie
| | Cond. View | View 1 | View 2 | View 3 |
|:-------------|:-----------------------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|
| **Original** |  |  |  |  |
| **Old** |  |  |  |  |
| **Zombie** |  |  |  |  |
# Project Structure
The repository is organized as follows:
```text
EditP23/
βββ examples/ # Example assets for quick testing
β βββ deer_wings/
β β βββ src.png
β β βββ edited.png
β β βββ src_mv.png
β βββ robot_sunglasses/
β βββ ...
βββ assets/ # Raw asset files
β βββ stormtrooper.glb
βββ scripts/ # Helper scripts for data preparation
β βββ render_mesh.py
β βββ img2mv.py
βββ src/ # Main source code
β βββ init.py
β βββ edit_mv.py
β βββ main.py
β βββ pipeline.py
β βββ utils.py
βββ .gitignore
βββ README.md
```
# Utilities
## Setup
This guide shows how to prepare inputs for **EditP23** and run an edit.
These helper scripts create the three PNG files every experiment needs:
| File | Purpose |
|---------------|-----------------------------------------------------------------|
| `src.png` | Original single view (the one you will edit). |
| `edited.png` | Your 2D edit of `src.png`. |
| `src_mv.png` | 6-view grid of the original object. |
### 1. Generate `src.png` and `src_mv.png`
**EditP23** needs a **source view** (`src.png`) and a **multi-view grid** (`src_mv.png`).
The grid contains six extra views at fixed azimuth/elevation pairs:
Angles (azimuth, elevation): `(30Β°, 20Β°) (90Β°, -10Β°) (150Β°, 20Β°) (210Β°, -10Β°) (270Β°, 20Β°) (330Β°, -10Β°)` and for the prompt view `(0Β°, 20Β°)`.
We provide two methods to generate these inputs. Both methods produce views on a clean, white background.
Both methods below produce the multi-view grid and the source view from the relevant angles on a white background.
#### Method A: From a Single Image
You can generate the multi-view grid from a single image of an object using our `img2mv.py` script. This script leverages the Zero123++ pipeline with a checkpoint from InstantMesh, which is fine-tuned to produce white backgrounds.
```bash
# This script takes a single input image and generates the corresponding multi-view grid.
python scripts/img2mv.py \
--input_image "examples/robot_sunglasses/src.png" \
--output_dir "examples/robot_sunglasses/"
```
**Note:** In this case, `src.png` serves as the source view for EditP23.
#### Method B: From a 3D Mesh
If you have a 3D model, you can use our Blender script to render both the source view and the multi-view grid.
**Prerequisite:** This script requires Blender (`pip install bpy`).
```bash
# This script renders a source view and a multi-view grid from a 3D mesh.
python scripts/render_mesh.py \
--mesh_path "assets/stormtrooper.glb" \
--output_dir "examples/stormtrooper/"
```
### 2. Generating `edited.png`
Once you have your **source view**, you can use any 2D image editor to make your desired changes. We use this user-provided edit to guide the 3D modification.
For quick edits, you can use readily available online tools, such as the following HuggingFace Spaces:
- [FlowEdit](https://huggingface.co/spaces/fallenshock/FlowEdit): Excellent for global, structural edits.
- [Flux-Inpainting](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Fill-dev): Great for local modifications and inpainting.
## Reconstruction
After generating an edited multi-view image (`edited_mv.png`) with our main script, you can reconstruct it into a 3D model. We provide a helper script that uses the [InstantMesh](https://github.com/TencentARC/InstantMesh) framework to produce a textured `.obj` file and a turntable video.
### Additional Dependencies
First, you'll need to install several libraries required for the reconstruction process.
<details>
<summary>Click to expand installation instructions</summary>
```bash
# Install general dependencies
pip install opencv-python einops xatlas imageio[ffmpeg]
# Install NVIDIA's nvdiffrast library
pip install git+https://github.com/NVlabs/nvdiffrast/
# For video export, ensure ffmpeg is installed
# On conda, you can run:
conda install ffmpeg
```
</details>
### Running the Reconstruction
The reconstruction script takes the multi-view PNG as input and generates the 3D assets. The necessary model config file (instant-mesh-large.yaml) is included in the configs/ directory of the InstanMesh repository.
#### Example Command
````bash
python scripts/recon.py \
external/instant-mesh/configs/instant-mesh-large.yaml \
--input_file "examples/robot_sunglasses/output/edited_mv.png" \
--output_dir "examples/robot_sunglasses/output/recon/"
````
### Command-Line Arguments
Here are the arguments for the recon.py script:
| Argument | Description | Default |
| :------------ | :----------------------------------------------------------------- | :----------- |
| `config` | **(Required)** Path to the InstantMesh model config file. | |
| `--input_file`| **(Required)** Path to the multi-view PNG file you want to reconstruct. | |
| `--output_dir`| Directory where the output `.obj` and `.mp4` files will be saved. | `"outputs/"` |
| `--scale` | Scale of the input cameras. | `1.0` |
| `--distance` | Camera distance for rendering the output video. | `4.5` |
| `--no_video` | A flag to disable saving the `.mp4` video. | `False` | |