ModelMan

Runtime error

App Files Files Community

wyysf commited on May 23, 2024

Commit

0f079b2

0 Parent(s):

i

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +40 -0
.gitignore +1 -0
README.md +11 -0
README_zh.md +173 -0
apps/.vscode/launch.json +15 -0
apps/__pycache__/mv_models.cpython-310.pyc +0 -0
apps/__pycache__/mv_models.cpython-38.pyc +0 -0
apps/__pycache__/utils.cpython-310.pyc +0 -0
apps/__pycache__/utils.cpython-38.pyc +0 -0
apps/examples/1_cute_girl.webp +0 -0
apps/examples/blue_monster.webp +0 -0
apps/examples/boy.webp +0 -0
apps/examples/boy2.webp +0 -0
apps/examples/bulldog.webp +0 -0
apps/examples/catman.webp +0 -0
apps/examples/cyberpunk_man.webp +0 -0
apps/examples/dinosaur_boy.webp +0 -0
apps/examples/dog.webp +0 -0
apps/examples/doraemon.webp +0 -0
apps/examples/dragon.webp +0 -0
apps/examples/elf.webp +0 -0
apps/examples/ghost-eating-burger.webp +0 -0
apps/examples/girl1.webp +0 -0
apps/examples/gun.webp +0 -0
apps/examples/kunkun.webp +0 -0
apps/examples/link.webp +0 -0
apps/examples/mushroom1.webp +0 -0
apps/examples/mushroom2.webp +0 -0
apps/examples/pikachu.webp +0 -0
apps/examples/plants.webp +0 -0
apps/examples/rose.webp +0 -0
apps/examples/shoe.webp +0 -0
apps/examples/sports_girl.webp +0 -0
apps/examples/stone.webp +0 -0
apps/examples/sweater.webp +0 -0
apps/examples/sword.webp +0 -0
apps/examples/teapot.webp +0 -0
apps/examples/toy1.webp +0 -0
apps/examples/toy_bear.webp +0 -0
apps/examples/toy_dog.webp +0 -0
apps/examples/toy_pig.webp +0 -0
apps/examples/toy_rabbit.webp +0 -0
apps/examples/wings.webp +0 -0
apps/gradio_app.py +272 -0
apps/mv_models.py +162 -0
apps/third_party/CRM/.gitignore +155 -0
apps/third_party/CRM/LICENSE +21 -0
apps/third_party/CRM/README.md +85 -0
apps/third_party/CRM/__init__.py +0 -0
apps/third_party/CRM/app.py +228 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,40 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+apps/ckpts/InstantMeshes filter=lfs diff=lfs merge=lfs -text
+apps/third_party/Wonder3D/assets/fig_teaser.png filter=lfs diff=lfs merge=lfs -text
+asset/video_cover.png filter=lfs diff=lfs merge=lfs -text
+apps/InstantMeshes filter=lfs diff=lfs merge=lfs -text
+apps/third_party/InstantMeshes filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ gradio_cached_dir

README.md ADDED Viewed

	@@ -0,0 +1,11 @@

+---
+title: 'CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner'
+emoji: 🚀
+colorFrom: indigo
+colorTo: pink
+sdk: gradio
+sdk_version: 4.31.5
+app_file: gradio_app.py
+pinned: false
+license: agpl-3.0
+---

README_zh.md ADDED Viewed

	@@ -0,0 +1,173 @@

+<p align="center">
+  <img src="asset/logo.png"  height=220>
+</p>
+### <div align="center">匠心：基于3D原生扩模型和交互式几何优化的高质量网格模型生成<div>
+#####  <p align="center"> [Weiyu Li<sup>1,2</sup>](https://wyysf-98.github.io/), Jiarui Liu<sup>1,2</sup>, [Rui Chen<sup>1,2</sup>](https://aruichen.github.io/), [Yixun Liang<sup>3,2</sup>g](https://yixunliang.github.io/), [Xuelin Chen<sup>4</sup>](https://xuelin-chen.github.io/), [Ping Tan<sup>1,2</sup>](https://ece.hkust.edu.hk/pingtan), [Xiaoxiao Long<sup>5</sup>](https://www.xxlong.site/)</p>
+#####  <p align="center"> <sup>1</sup>香港科技大学, <sup>2</sup>光影幻象, <sup>3</sup>香港科技大学（广州）, <sup>4</sup>腾讯 AI Lab, <sup>5</sup>香港大学</p>
+<div align="center">
+  <a href="https://github.com/Craftsman3D.github.io/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a> &ensp;
+  <a href="https://huggingface.co/"><img src="https://img.shields.io/static/v1?label=SAM-LLaVA&message=HF&color=yellow"></a> &ensp;
+  <a href="https://arxiv.org/abs/xxx"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv&color=red&logo=arxiv"></a> &ensp;
+</div>
+#### TL; DR: <font color="red">**CraftsMan (又名 匠心)**</font> 是一个两阶段的文本/图像到3D网格生成模型。通过模仿艺术家/工匠的建模工作流程，我们提出首先使用3D扩散模型生成一个具有平滑几何形状的粗糙网格（5秒），然后使用2D法线扩散生成的增强型多视图法线图进行细化（20秒），这也可以通过类似Zbrush的交互方式进行。
+## ✨ 总览
+这个仓库包含了我们3D网格生成项目的源代码（训练/推理）、预训练权重和gradio演示代码，你可以在我们的[项目页面](https://github.com/Craftsman3D.github.io/)找到更多的可视化内容。如果你有高质量的3D数据或其他想法，我们非常欢迎任何形式的合作。
+<details><summary>完整摘要</summary>
+我们提出了一个新颖的3D建模系统，匠心。它可以生成具有多样形状、规则网格拓扑和光滑表面的高保真3D几何，并且值得注意的是，它可以和人工建模流程一样以交互方式细化几何体。尽管3D生成领域取得了显著进展，但现有方法仍然难以应对漫长的优化过程、不规则的网格拓扑、嘈杂的表面以及难以适应用户编辑的问题，因此阻碍了它们在3D建模软件中的广泛采用和实施。我们的工作受到工匠建模的启发，他们通常会首先粗略地勾勒出作品的整体形状，然后详细描绘表面细节。具体来说，我们采用了一个3D原生扩散模型，该模型在从基于潜在集的3D表示学习到的潜在空间上操作，只需几秒钟就可以生成具有规则网格拓扑的粗糙几何体。特别是，这个过程以文本提示或参考图像作为输入，并利用强大的多视图（MV）二维扩散模型生成粗略几何体的多个视图，这些视图被输入到我们的多视角条件3D扩散模型中，用于生成3D几何，显著提高其了鲁棒性和泛化能力。随后，使用基于法线的几何细化器显著增强表面细节。这种细化可以自动执行，或者通过用户提供的编辑以交互方式进行。广泛的实验表明，我们的方法在生成优于现有方法的高质量3D资产方面十分高效。
+</details>
+<p align="center">
+  <img src="asset/teaser.jpg" >
+</p>
+## 内容
+* [视频](#Video)
+* [预训练模型](##-Pretrained-models)
+* [Gradio & Huggingface 示例](#Gradio-demo)
+* [推理代码](#Inference)
+* [训练代码](#Train)
+* [数据准备](#data)
+* [致谢](#Acknowledgements)
+* [引用](#Bibtex)
+## 环境搭建
+<details> <summary>硬件</summary>
+我们在32个A800 GPU上以每GPU 32的批量大小训练模型，训练了7天。
+网格细化部分在GTX 3080 GPU上执行。
+</details>
+<details> <summary>运行环境搭建</summary>
+:smiley: 为了方便使用，我们提供了docker镜像文件[Setup using Docker](./docker/README.md).
+ - Python 3.10.0
+ - PyTorch 2.1.0
+ - Cuda Toolkit 11.8.0
+ - Ubuntu 22.04
+克隆这个仓库.
+```sh
+git clone [email protected]:wyysf-98/CraftsMan.git
+```
+安装所需要的依赖包.
+```sh
+conda create -n CraftsMan python=3.10
+conda activate CraftsMan
+conda install -c pytorch pytorch=2.3.0 torchvision=0.18.0 cudatoolkit=11.8 && \
+pip install -r docker/requirements.txt
+```
+</details>
+# 🎥 视频
+[![观看视频](asset/video_cover.png)](https://www.youtube.com/watch?v=WhEs4tS4mGo)
+# 三维原生扩散模型 (Latent Set Diffusion Model)
+我们在这里提供了训练和推理代码，以便于未来的研究。
+The latent set diffusion model 在很大程度上基于[Michelangelo](https://github.com/NeuralCarver/Michelangelo),
+采用了 [perceiver](https://github.com/google-deepmind/deepmind-research/blob/master/perceiver/perceiver.py) 架构，并且参数量仅为104M.
+## 预训练模型
+目前，我们提供了以4视图图像作为条件，并通过ModLN将相机信息注入到clip特征提取器的模型。
+我们将根据实际情况考虑开源进一步的模型。
+我们的推理脚本将自动下载模型。或者，您可以手动下载模型并将它们放在ckpts/目录下。
+## Gradio 示例
+我们提供了不同的文本/图像到多视角图像扩散模型的gradio演示，例如[CRM](https://github.com/thu-ml/CRM), [Wonder3D](https://github.com/xxlong0/Wonder3D/) and [LGM](https://github.com/3DTopia/LGM). 您可以选择不同的模型以获得更好的结果。要在本地机器上运行gradio演示，请简单运行：
+```bash
+python app/
+```
+## 模型推理
+要通过命令行从图像文件夹生成3D网格，简单运行：
+```bash
+python launch.py --config .configs/image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6.yaml \
+                 --validate --gpu 0
+```
+我们默认使用 [rembg](https://github.com/danielgatis/rembg) 来通过前景对象分割。如果输入图像已经有alpha蒙版，请指定no_rembg标志符：
+如果您有其他视图的图像（左，右，背面），您可以通过下面指令指定图像：
+## 从头开始训练
+我们提供了我们的训练代码以方便未来的研究。我们将在接下来的几天内提供少量的数据样本。
+有关更多的训练细节和配置，请参考configs文件夹。
+```bash
+### training the shape-autoencoder
+python launch.py --config ./configs/shape-autoencoder/l256-e64-ne8-nd16.yaml \
+                 --train --gpu 0
+### training the image-to-shape diffusion model
+python launch.py --config .configs/image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6.yaml \
+                 --train --gpu 0
+```
+# 2D法线增强扩散模型（即将推出）
+我们正在努力发布我们的三维网格细化代码。感谢您的耐心等待，我们将为这个激动人心的发展做最后的努力。" 🔧🚀
+您也可以在视频中找到网格细化部分的结果。
+# ❓常见问题
+问题: 如何获得更好的结果？
+1. 匠心模型将多视图图像作为3D扩散模型的条件。通过我们的实验，与像([Wonder3D](https://github.com/xxlong0/Wonder3D/), [InstantMesh](https://github.com/TencentARC/InstantMesh/tree/main))这样的重建模型相比， 我们的方法对多视图不一致性更加稳健。由于我们依赖图像到MV模型，输入图像的面对方向非常重要，并且总是会导致良好的重建。
+2. 如果您有自己的多视图图像，这将是一个不错的选择来
+3. 就像2D扩散模型一样，尝试不同的随机数种子，调整CFG比例或不同的调度器。
+4. 我们将在后期考虑提供一个以文本提示为条件的版本，因此您可以使用一些正面和负面的提示。
+# 💪 待办事项
+- [x]  推理代码
+- [x]  训练代码
+- [x]  Gradio & Hugging Face演示
+- [x]  模型库，我们将在未来发布更多的ckpt
+- [ ]  环境设置
+- [ ]  数据样本
+- [ ]  Google Colab示例
+- [ ]  网格细化代码
+# 🤗 致谢
+- 感谢[光影幻像](https://www.lightillusions.com/)提供计算资源和潘建雄进行数据预处理。如果您对高质量的3D生成有任何想法，欢迎与我们联系！
+- Thanks to [Hugging Face](https://github.com/huggingface) for sponsoring the nicely demo!
+- Thanks to [3DShape2VecSet](https://github.com/1zb/3DShape2VecSet/tree/master) for their amazing work, the latent set representation provides an efficient way to represent 3D shape!
+- Thanks to [Michelangelo](https://github.com/NeuralCarver/Michelangelo) for their great work, our model structure is heavily build on this repo!
+- Thanks to [CRM](https://github.com/thu-ml/CRM), [Wonder3D](https://github.com/xxlong0/Wonder3D/) and [LGM](https://github.com/3DTopia/LGM) for their released model about multi-view images generation. If you have a more advanced version and want to contribute to the community, we are welcome to update.
+- 感谢 [Objaverse](https://objaverse.allenai.org/), [Objaverse-MIX](https://huggingface.co/datasets/BAAI/Objaverse-MIX/tree/main) 开源的数据，这帮助我们进行了许多验证实验。
+- 感谢 [ThreeStudio](https://github.com/threestudio-project/threestudio) 实现了一个完整的框架，我们参考他们出色且易于使用的代码结构。
+# 📑许可证
+CraftsMan在[AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.en.html)下，因此任何包含CraftsMan代码或训练模型（无论是预训练还是自定义训练）的下游解决方案和产品（包括云服务）都应该是开源的，以符合AGPL的条件。如果您对CraftsMan的使用有任何疑问，请先与我们联系。
+# 📖 BibTeX
+    @misc{li2024craftsman,
+    title         = {CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner},
+    author        = {Weiyu Li and Jiarui Liu and Rui Chen and Yixun Liang and Xuelin Chen and Ping Tan and Xiaoxiao Long},
+    year          = {2024},
+    archivePrefix = {arXiv},
+    primaryClass  = {cs.CG}
+    }

apps/.vscode/launch.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+    // Use IntelliSense to learn about possible attributes.
+    // Hover to view descriptions of existing attributes.
+    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
+    "version": "0.2.0",
+    "configurations": [
+        {
+            "name": "Python Debugger: Current File",
+            "type": "debugpy",
+            "request": "launch",
+            "program": "${file}",
+            "console": "integratedTerminal"
+        }
+    ]
+}

apps/__pycache__/mv_models.cpython-310.pyc ADDED Viewed

Binary file (5.38 kB). View file

apps/__pycache__/mv_models.cpython-38.pyc ADDED Viewed

Binary file (5.33 kB). View file

apps/__pycache__/utils.cpython-310.pyc ADDED Viewed

Binary file (7.54 kB). View file

apps/__pycache__/utils.cpython-38.pyc ADDED Viewed

Binary file (7.52 kB). View file

apps/examples/1_cute_girl.webp ADDED Viewed

apps/examples/blue_monster.webp ADDED Viewed

apps/examples/boy.webp ADDED Viewed

apps/examples/boy2.webp ADDED Viewed

apps/examples/bulldog.webp ADDED Viewed

apps/examples/catman.webp ADDED Viewed

apps/examples/cyberpunk_man.webp ADDED Viewed

apps/examples/dinosaur_boy.webp ADDED Viewed

apps/examples/dog.webp ADDED Viewed

apps/examples/doraemon.webp ADDED Viewed

apps/examples/dragon.webp ADDED Viewed

apps/examples/elf.webp ADDED Viewed

apps/examples/ghost-eating-burger.webp ADDED Viewed

apps/examples/girl1.webp ADDED Viewed

apps/examples/gun.webp ADDED Viewed

apps/examples/kunkun.webp ADDED Viewed

apps/examples/link.webp ADDED Viewed

apps/examples/mushroom1.webp ADDED Viewed

apps/examples/mushroom2.webp ADDED Viewed

apps/examples/pikachu.webp ADDED Viewed

apps/examples/plants.webp ADDED Viewed

apps/examples/rose.webp ADDED Viewed

apps/examples/shoe.webp ADDED Viewed

apps/examples/sports_girl.webp ADDED Viewed

apps/examples/stone.webp ADDED Viewed

apps/examples/sweater.webp ADDED Viewed

apps/examples/sword.webp ADDED Viewed

apps/examples/teapot.webp ADDED Viewed

apps/examples/toy1.webp ADDED Viewed

apps/examples/toy_bear.webp ADDED Viewed

apps/examples/toy_dog.webp ADDED Viewed

apps/examples/toy_pig.webp ADDED Viewed

apps/examples/toy_rabbit.webp ADDED Viewed

apps/examples/wings.webp ADDED Viewed

apps/gradio_app.py ADDED Viewed

	@@ -0,0 +1,272 @@

+import argparse
+import os
+import json
+import torch
+import sys
+import time
+import importlib
+import numpy as np
+from omegaconf import OmegaConf
+from huggingface_hub import hf_hub_download
+from collections import OrderedDict
+import trimesh
+from einops import repeat, rearrange
+import pytorch_lightning as pl
+from typing import Dict, Optional, Tuple, List
+import gradio as gr
+from utils import *
+proj_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+sys.path.append(os.path.join(proj_dir))
+import craftsman
+from craftsman.systems.base import BaseSystem
+from craftsman.utils.config import ExperimentConfig, load_config
+from mv_models import GenMVImage
+_TITLE = '''CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner'''
+_DESCRIPTION = '''
+<div>
+Select or upload a image, then just click 'Generate'.
+<br>
+By mimicking the artist/craftsman modeling workflow, we propose CraftsMan (aka 匠心) that uses 3D Latent Set Diffusion Model that directly generate coarse meshes,
+then a multi-view normal enhanced image generation model is used to refine the mesh.
+We provide the coarse 3D diffusion part here.
+<br>
+If you found Crafts is helpful, please help to ⭐ the <a href='https://github.com/wyysf-98/CraftsMan/' target='_blank'>Github Repo</a>. Thanks!
+<a style="display:inline-block; margin-left: .5em" href='https://github.com/wyysf-98/CraftsMan/'><img src='https://img.shields.io/github/stars/wyysf-98/CraftsMan?style=social' /></a>
+<br>
+*please note that the model is fliped due to the gradio viewer, please download the obj file and you will get the correct mesh.
+<br>
+*If you have your own multi-view images, you can directly upload it.
+</div>
+'''
+_CITE_ = r"""
+---
+📝 **Citation**
+If you find our work useful for your research or applications, please cite using this bibtex:
+```bibtex
+@article{craftsman,
+author    = {Weiyu Li and Jiarui Liu and Rui Chen and Yixun Liang and Xuelin Chen and Ping Tan and Xiaoxiao Long},
+title     = {CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner},
+journal   = {arxiv:xxx},
+year      = {2024},
+}
+```
+🤗 **Acknowledgements**
+We use <a href='https://github.com/wjakob/instant-meshes' target='_blank'>Instant Meshes</a> to remesh the generated mesh to a lower face count, thanks to the authors for the great work.
+📋 **License**
+CraftsMan is under [AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.en.html), so any downstream solution and products (including cloud services) that include CraftsMan code or a trained model (both pretrained or custom trained) inside it should be open-sourced to comply with the AGPL conditions. If you have any questions about the usage of CraftsMan, please contact us first.
+📧 **Contact**
+If you have any questions, feel free to open a discussion or contact us at <b>[email protected]</b>.
+"""
+model = None
+cached_dir = None
+def image2mesh(view_front: np.ndarray,
+               view_right: np.ndarray,
+               view_back: np.ndarray,
+               view_left: np.ndarray,
+               more: bool = False,
+               scheluder_name: str ="DDIMScheduler",
+               guidance_scale: int = 7.5,
+               seed: int = 4,
+               octree_depth: int = 7):
+    sample_inputs = {
+        "mvimages": [[
+            Image.fromarray(view_front),
+            Image.fromarray(view_right),
+            Image.fromarray(view_back),
+            Image.fromarray(view_left)
+        ]]
+    }
+    global model
+    latents = model.sample(
+        sample_inputs,
+        sample_times=1,
+        guidance_scale=guidance_scale,
+        return_intermediates=False,
+        seed=seed
+    )[0]
+    # decode the latents to mesh
+    box_v = 1.1
+    mesh_outputs, _ = model.shape_model.extract_geometry(
+        latents,
+        bounds=[-box_v, -box_v, -box_v, box_v, box_v, box_v],
+        octree_depth=octree_depth
+    )
+    assert len(mesh_outputs) == 1, "Only support single mesh output for gradio demo"
+    mesh = trimesh.Trimesh(mesh_outputs[0][0], mesh_outputs[0][1])
+    filepath = f"{cached_dir}/{time.time()}.obj"
+    mesh.export(filepath, include_normals=True)
+    if 'Remesh' in more:
+        print("Remeshing with Instant Meshes...")
+        target_face_count = int(len(mesh.faces)/10)
+        command = f"{proj_dir}/apps/third_party/InstantMeshes {filepath} -f {target_face_count} -d -S 0 -r 6 -p 6 -o {filepath.replace('.obj', '_remeshed.obj')}"
+        os.system(command)
+        filepath = filepath.replace('.obj', '_remeshed.obj')
+    return filepath
+if __name__=="__main__":
+    parser = argparse.ArgumentParser()
+    # parser.add_argument("--model_path", type=str, required=True, help="Path to the object file",)
+    parser.add_argument("--cached_dir", type=str, default="./gradio_cached_dir")
+    parser.add_argument("--device", type=int, default=0)
+    args = parser.parse_args()
+    cached_dir = args.cached_dir
+    os.makedirs(args.cached_dir, exist_ok=True)
+    device = torch.device(f"cuda:{args.device}" if torch.cuda.is_available() else "cpu")
+    print(f"using device: {device}")
+    # for multi-view images generation
+    background_choice = OrderedDict({
+        "Alpha as Mask": "Alpha as Mask",
+        "Auto Remove Background": "Auto Remove Background",
+        "Original Image": "Original Image",
+    })
+    mvimg_model_config_list = ["CRM", "ImageDream", "Wonder3D"]
+    # for 3D latent set diffusion
+    # for 3D latent set diffusion
+    ckpt_path = hf_hub_download(repo_id="wyysf/CraftsMan", filename="image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6/model.ckpt", repo_type="model")
+    config_path = hf_hub_download(repo_id="wyysf/CraftsMan", filename="image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6/config.yaml", repo_type="model")
+    scheluder_dict = OrderedDict({
+        "DDIMScheduler": 'diffusers.schedulers.DDIMScheduler',
+        # "DPMSolverMultistepScheduler": 'diffusers.schedulers.DPMSolverMultistepScheduler', # not support yet
+        # "UniPCMultistepScheduler": 'diffusers.schedulers.UniPCMultistepScheduler', # not support yet
+    })
+    # main GUI
+    custom_theme = gr.themes.Soft(primary_hue="blue").set(
+                    button_secondary_background_fill="*neutral_100",
+                    button_secondary_background_fill_hover="*neutral_200")
+    custom_css = '''#disp_image {
+        text-align: center; /* Horizontally center the content */
+    }'''
+    with gr.Blocks(title=_TITLE, theme=custom_theme, css=custom_css) as demo:
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown('# ' + _TITLE)
+        gr.Markdown(_DESCRIPTION)
+        with gr.Row():
+            with gr.Column(scale=2):
+                with gr.Row():
+                    image_input = gr.Image(
+                        label="Image Input",
+                        image_mode="RGBA",
+                        sources="upload",
+                        type="pil",
+                    )
+                with gr.Row():
+                    text = gr.Textbox(label="Prompt (Optional, only works for mvdream)", visible=False)
+                with gr.Row():
+                    gr.Markdown('''Try a different <b>seed</b> if the result is unsatisfying. Good Luck :)''')
+                with gr.Row():
+                    seed = gr.Number(42, label='Seed', show_label=True)
+                    more = gr.CheckboxGroup(["Remesh", "Symmetry(TBD)"], label="More", show_label=False)
+                    # remesh = gr.Checkbox(value=False, label='Remesh')
+                    # symmetry = gr.Checkbox(value=False, label='Symmetry(TBD)', interactive=False)
+                run_btn = gr.Button('Generate', variant='primary', interactive=True)
+                with gr.Row():
+                    gr.Examples(
+                        examples=[os.path.join("./apps/examples", i) for i in os.listdir("./apps/examples")],
+                        inputs=[image_input],
+                        examples_per_page=8
+                    )
+            with gr.Column(scale=4):
+                with gr.Row():
+                    output_model_obj = gr.Model3D(
+                        label="Output Model (OBJ Format)",
+                        camera_position=(90.0, 90.0, 3.5),
+                        interactive=False,
+                    )
+                with gr.Row():
+                    view_front = gr.Image(label="Front", interactive=True, show_label=True)
+                    view_right = gr.Image(label="Right", interactive=True, show_label=True)
+                    view_back = gr.Image(label="Back", interactive=True, show_label=True)
+                    view_left = gr.Image(label="Left", interactive=True, show_label=True)
+                with gr.Accordion('Advanced options', open=False):
+                    with gr.Row(equal_height=True):
+                        run_mv_btn = gr.Button('Only Generate 2D', interactive=True)
+                        run_3d_btn = gr.Button('Only Generate 3D', interactive=True)
+                with gr.Accordion('Advanced options (2D)', open=False):
+                    with gr.Row():
+                        crop_size = gr.Number(224, label='Crop size')
+                        mvimg_model = gr.Dropdown(value="CRM", label="MV Image Model", choices=mvimg_model_config_list)
+                    with gr.Row():
+                        foreground_ratio = gr.Slider(
+                                label="Foreground Ratio",
+                                minimum=0.5,
+                                maximum=1.0,
+                                value=1.0,
+                                step=0.05,
+                            )
+                    with gr.Row():
+                        background_choice = gr.Dropdown(label="Backgroud Choice", value="Auto Remove Background",choices=list(background_choice.keys()))
+                        rmbg_type = gr.Dropdown(label="Backgroud Remove Type", value="rembg",choices=['sam', "rembg"])
+                        backgroud_color = gr.ColorPicker(label="Background Color", value="#FFFFFF", interactive=True)
+                    with gr.Row():
+                        mvimg_guidance_scale = gr.Number(value=3.5, minimum=3, maximum=10, label="2D Guidance Scale")
+                        mvimg_steps = gr.Number(value=50, minimum=20, maximum=100, label="2D Sample Steps", precision=0)
+                with gr.Accordion('Advanced options (3D)', open=False):
+                    with gr.Row():
+                        guidance_scale = gr.Number(label="3D Guidance Scale", value=7.5, minimum=3.0, maximum=10.0)
+                        steps = gr.Number(value=50, minimum=20, maximum=100, label="3D Sample Steps", precision=0)
+                    with gr.Row():
+                        scheduler = gr.Dropdown(label="scheluder", value="DDIMScheduler",choices=list(scheluder_dict.keys()))
+                        octree_depth = gr.Slider(label="Octree Depth", value=7, minimum=4, maximum=8, step=1)
+        gr.Markdown(_CITE_)
+        outputs = [output_model_obj]
+        rmbg = RMBG(device)
+        gen_mvimg = GenMVImage(device)
+        model = load_model(ckpt_path, config_path, device)
+        run_btn.click(fn=check_input_image, inputs=[image_input]
+                    ).success(
+                            fn=rmbg.run,
+                            inputs=[rmbg_type, image_input, crop_size, foreground_ratio, background_choice, backgroud_color],
+                            outputs=[image_input]
+                    ).success(
+                            fn=gen_mvimg.run,
+                            inputs=[mvimg_model, text, image_input, crop_size, seed, mvimg_guidance_scale, mvimg_steps],
+                            outputs=[view_front, view_right, view_back, view_left]
+                    ).success(
+                            fn=image2mesh,
+                            inputs=[view_front, view_right, view_back, view_left, more, scheduler, guidance_scale, seed, octree_depth],
+                            outputs=outputs,
+                            api_name="generate_img2obj")
+        run_mv_btn.click(fn=gen_mvimg.run,
+                        inputs=[mvimg_model, text, image_input, crop_size, seed, mvimg_guidance_scale, mvimg_steps],
+                        outputs=[view_front, view_right, view_back, view_left]
+        )
+        run_3d_btn.click(fn=image2mesh,
+                        inputs=[view_front, view_right, view_back, view_left, more, scheduler, guidance_scale, seed, octree_depth],
+                        outputs=outputs,
+                        api_name="generate_img2obj")
+        demo.queue().launch(share=True, allowed_paths=[args.cached_dir])

apps/mv_models.py ADDED Viewed

	@@ -0,0 +1,162 @@

+import gradio as gr
+import numpy as np
+import torch
+import PIL
+from PIL import Image
+import os
+import sys
+import rembg
+import time
+import json
+import cv2
+from datetime import datetime
+from einops import repeat, rearrange
+from omegaconf import OmegaConf
+from typing import Dict, Optional, Tuple, List
+from dataclasses import dataclass
+from .utils import *
+from huggingface_hub import hf_hub_download
+parent_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+class GenMVImage(object):
+    def __init__(self, device):
+        self.seed = 1024
+        self.guidance_scale = 7.5
+        self.step = 50
+        self.pipelines = {}
+        self.device = device
+    def gen_image_from_crm(self, image):
+        from .third_party.CRM.pipelines import TwoStagePipeline
+        specs = json.load(open(f"{parent_dir}/apps/third_party/CRM/configs/specs_objaverse_total.json"))
+        stage1_config = OmegaConf.load(f"{parent_dir}/apps/third_party/CRM/configs/nf7_v3_SNR_rd_size_stroke.yaml").config
+        stage1_sampler_config = stage1_config.sampler
+        stage1_model_config = stage1_config.models
+        stage1_model_config.resume = hf_hub_download(repo_id="Zhengyi/CRM", filename="pixel-diffusion.pth", repo_type="model")
+        stage1_model_config.config = f"{parent_dir}/apps/third_party/CRM/" + stage1_model_config.config
+        if "crm" in self.pipelines.keys():
+            pipeline = self.pipelines['crm']
+        else:
+            self.pipelines['crm'] = TwoStagePipeline(
+                                        stage1_model_config,
+                                        stage1_sampler_config,
+                                        device=self.device,
+                                        dtype=torch.float16
+                                    )
+            pipeline = self.pipelines['crm']
+        pipeline.set_seed(self.seed)
+        rt_dict = pipeline(image, scale=self.guidance_scale, step=self.step)
+        mv_imgs = rt_dict["stage1_images"]
+        return mv_imgs[5], mv_imgs[3], mv_imgs[2], mv_imgs[0]
+    def gen_image_from_mvdream(self, image, text):
+        from .third_party.mvdream_diffusers.pipeline_mvdream import MVDreamPipeline
+        if image is None:
+            if "mvdream" in self.pipelines.keys():
+                pipe_MVDream = self.pipelines['mvdream']
+            else:
+                self.pipelines['mvdream'] = MVDreamPipeline.from_pretrained(
+                    "ashawkey/mvdream-sd2.1-diffusers", # remote weights
+                    torch_dtype=torch.float16,
+                    trust_remote_code=True,
+                )
+                self.pipelines['mvdream'] = self.pipelines['mvdream'].to(self.device)
+                pipe_MVDream = self.pipelines['mvdream']
+            mv_imgs = pipe_MVDream(
+                    text,
+                    negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy",
+                    num_inference_steps=self.step,
+                    guidance_scale=self.guidance_scale,
+                    generator = torch.Generator(self.device).manual_seed(self.seed)
+                )
+        else:
+            image = np.array(image)
+            image = image.astype(np.float32) / 255.0
+            image = image[..., :3] * image[..., 3:4] + (1 - image[..., 3:4])
+            if "imagedream" in self.pipelines.keys():
+                pipe_imagedream = self.pipelines['imagedream']
+            else:
+                self.pipelines['imagedream'] = MVDreamPipeline.from_pretrained(
+                        "ashawkey/imagedream-ipmv-diffusers", # remote weights
+                        torch_dtype=torch.float16,
+                        trust_remote_code=True,
+                    )
+                self.pipelines['imagedream'] = self.pipelines['imagedream'].to(self.device)
+                pipe_imagedream = self.pipelines['imagedream']
+            mv_imgs = pipe_imagedream(
+                        text,
+                        image,
+                        negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy",
+                        num_inference_steps=self.step,
+                        guidance_scale=self.guidance_scale,
+                        generator = torch.Generator(self.device).manual_seed(self.seed)
+                    )
+        return mv_imgs[1], mv_imgs[2], mv_imgs[3], mv_imgs[0]
+    def gen_image_from_wonder3d(self, image, crop_size):
+        sys.path.append(f"{parent_dir}/apps/third_party/Wonder3D")
+        from .third_party.Wonder3D.mvdiffusion.pipelines.pipeline_mvdiffusion_image import MVDiffusionImagePipeline
+        weight_dtype = torch.float16
+        batch = prepare_data(image, crop_size)
+        if "wonder3d" in self.pipelines.keys():
+            pipeline = self.pipelines['wonder3d']
+        else:
+            self.pipelines['wonder3d'] = MVDiffusionImagePipeline.from_pretrained(
+                        'flamehaze1115/wonder3d-v1.0',
+                        custom_pipeline=f'{parent_dir}/apps/third_party/Wonder3D/mvdiffusion/pipelines/pipeline_mvdiffusion_image.py',
+                        torch_dtype=weight_dtype
+                    )
+            self.pipelines['wonder3d'].unet.enable_xformers_memory_efficient_attention()
+            self.pipelines['wonder3d'].to(self.device)
+            self.pipelines['wonder3d'].set_progress_bar_config(disable=True)
+            pipeline = self.pipelines['wonder3d']
+        generator = torch.Generator(device=pipeline.unet.device).manual_seed(self.seed)
+        # repeat  (2B, Nv, 3, H, W)
+        imgs_in = torch.cat([batch['imgs_in']] * 2, dim=0).to(weight_dtype)
+        # (2B, Nv, Nce)
+        camera_embeddings = torch.cat([batch['camera_embeddings']] * 2, dim=0).to(weight_dtype)
+        task_embeddings = torch.cat([batch['normal_task_embeddings'], batch['color_task_embeddings']], dim=0).to(weight_dtype)
+        camera_embeddings = torch.cat([camera_embeddings, task_embeddings], dim=-1).to(weight_dtype)
+        # (B*Nv, 3, H, W)
+        imgs_in = rearrange(imgs_in, "Nv C H W -> (Nv) C H W")
+        # (B*Nv, Nce)
+        out = pipeline(
+            imgs_in,
+            # camera_embeddings,
+            generator=generator,
+            guidance_scale=self.guidance_scale,
+            num_inference_steps=self.step,
+            output_type='pt',
+            num_images_per_prompt=1,
+            **{'eta': 1.0},
+        ).images
+        bsz = out.shape[0] // 2
+        normals_pred = out[:bsz]
+        images_pred = out[bsz:]
+        normals_pred = [save_image(normals_pred[i]) for i in range(bsz)]
+        images_pred = [save_image(images_pred[i]) for i in range(bsz)]
+        mv_imgs = images_pred
+        return mv_imgs[0], mv_imgs[2], mv_imgs[4], mv_imgs[5]
+    def run(self, mvimg_model, text, image, crop_size, seed, guidance_scale, step):
+        self.seed = seed
+        self.guidance_scale = guidance_scale
+        self.step = step
+        if mvimg_model.upper() == "CRM":
+            return self.gen_image_from_crm(image)
+        elif mvimg_model.upper() == "IMAGEDREAM":
+            return self.gen_image_from_mvdream(image, text)
+        elif mvimg_model.upper() == "WONDER3D":
+            return self.gen_image_from_wonder3d(image, crop_size)

apps/third_party/CRM/.gitignore ADDED Viewed

	@@ -0,0 +1,155 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+out/

apps/third_party/CRM/LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2024 TSAIL group
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

apps/third_party/CRM/README.md ADDED Viewed

	@@ -0,0 +1,85 @@

+# Convolutional Reconstruction Model
+Official implementation for *CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model*.
+**CRM is a feed-forward model which can generate 3D textured mesh in 10 seconds.**
+## [Project Page](https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/) | [Arxiv](https://arxiv.org/abs/2403.05034) | [HF-Demo](https://huggingface.co/spaces/Zhengyi/CRM) | [Weights](https://huggingface.co/Zhengyi/CRM)
+https://github.com/thu-ml/CRM/assets/40787266/8b325bc0-aa74-4c26-92e8-a8f0c1079382
+## Try CRM 🍻
+* Try CRM at [Huggingface Demo](https://huggingface.co/spaces/Zhengyi/CRM).
+* Try CRM at [Replicate Demo](https://replicate.com/camenduru/crm). Thanks [@camenduru](https://github.com/camenduru)!
+## Install
+### Step 1 - Base
+Install package one by one, we use **python 3.9**
+```bash
+pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
+pip install torch-scatter==2.1.1 -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
+pip install kaolin==0.14.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-1.13.1_cu117.html
+pip install -r requirements.txt
+```
+besides, one by one need to install xformers manually according to the official [doc](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) (**conda no need**), e.g.
+```bash
+pip install ninja
+pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
+```
+### Step 2 - Nvdiffrast
+Install nvdiffrast according to the official [doc](https://nvlabs.github.io/nvdiffrast/#installation), e.g.
+```bash
+pip install git+https://github.com/NVlabs/nvdiffrast
+```
+## Inference
+We suggest gradio for a visualized inference.
+```
+gradio app.py
+```
+![image](https://github.com/thu-ml/CRM/assets/40787266/4354d22a-a641-4531-8408-c761ead8b1a2)
+For inference in command lines, simply run
+```bash
+CUDA_VISIBLE_DEVICES="0" python run.py --inputdir "examples/kunkun.webp"
+```
+It will output the preprocessed image, generated 6-view images and CCMs and a 3D model in obj format.
+**Tips:** (1) If the result is unsatisfatory, please check whether the input image is correctly pre-processed into a grey background. Otherwise the results will be unpredictable.
+(2) Different from the [Huggingface Demo](https://huggingface.co/spaces/Zhengyi/CRM), this official implementation uses UV texture instead of vertex color. It has better texture than the online demo but longer generating time owing to the UV texturing.
+## Todo List
+- [x] Release inference code.
+- [x] Release pretrained models.
+- [ ] Optimize inference code to fit in low memery GPU.
+- [ ] Upload training code.
+## Acknowledgement
+- [ImageDream](https://github.com/bytedance/ImageDream)
+- [nvdiffrast](https://github.com/NVlabs/nvdiffrast)
+- [kiuikit](https://github.com/ashawkey/kiuikit)
+- [GET3D](https://github.com/nv-tlabs/GET3D)
+## Citation
+```
+@article{wang2024crm,
+  title={CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model},
+  author={Zhengyi Wang and Yikai Wang and Yifei Chen and Chendong Xiang and Shuo Chen and Dajiang Yu and Chongxuan Li and Hang Su and Jun Zhu},
+  journal={arXiv preprint arXiv:2403.05034},
+  year={2024}
+}
+```

apps/third_party/CRM/__init__.py ADDED Viewed

File without changes

apps/third_party/CRM/app.py ADDED Viewed

	@@ -0,0 +1,228 @@

+# Not ready to use yet
+import argparse
+import numpy as np
+import gradio as gr
+from omegaconf import OmegaConf
+import torch
+from PIL import Image
+import PIL
+from pipelines import TwoStagePipeline
+from huggingface_hub import hf_hub_download
+import os
+import rembg
+from typing import Any
+import json
+import os
+import json
+import argparse
+from model import CRM
+from inference import generate3d
+pipeline = None
+rembg_session = rembg.new_session()
+def expand_to_square(image, bg_color=(0, 0, 0, 0)):
+    # expand image to 1:1
+    width, height = image.size
+    if width == height:
+        return image
+    new_size = (max(width, height), max(width, height))
+    new_image = Image.new("RGBA", new_size, bg_color)
+    paste_position = ((new_size[0] - width) // 2, (new_size[1] - height) // 2)
+    new_image.paste(image, paste_position)
+    return new_image
+def check_input_image(input_image):
+    if input_image is None:
+        raise gr.Error("No image uploaded!")
+def remove_background(
+    image: PIL.Image.Image,
+    rembg_session = None,
+    force: bool = False,
+    **rembg_kwargs,
+) -> PIL.Image.Image:
+    do_remove = True
+    if image.mode == "RGBA" and image.getextrema()[3][0] < 255:
+        # explain why current do not rm bg
+        print("alhpa channl not enpty, skip remove background, using alpha channel as mask")
+        background = Image.new("RGBA", image.size, (0, 0, 0, 0))
+        image = Image.alpha_composite(background, image)
+        do_remove = False
+    do_remove = do_remove or force
+    if do_remove:
+        image = rembg.remove(image, session=rembg_session, **rembg_kwargs)
+    return image
+def do_resize_content(original_image: Image, scale_rate):
+    # resize image content wile retain the original image size
+    if scale_rate != 1:
+        # Calculate the new size after rescaling
+        new_size = tuple(int(dim * scale_rate) for dim in original_image.size)
+        # Resize the image while maintaining the aspect ratio
+        resized_image = original_image.resize(new_size)
+        # Create a new image with the original size and black background
+        padded_image = Image.new("RGBA", original_image.size, (0, 0, 0, 0))
+        paste_position = ((original_image.width - resized_image.width) // 2, (original_image.height - resized_image.height) // 2)
+        padded_image.paste(resized_image, paste_position)
+        return padded_image
+    else:
+        return original_image
+def add_background(image, bg_color=(255, 255, 255)):
+    # given an RGBA image, alpha channel is used as mask to add background color
+    background = Image.new("RGBA", image.size, bg_color)
+    return Image.alpha_composite(background, image)
+def preprocess_image(image, background_choice, foreground_ratio, backgroud_color):
+    """
+    input image is a pil image in RGBA, return RGB image
+    """
+    print(background_choice)
+    if background_choice == "Alpha as mask":
+        background = Image.new("RGBA", image.size, (0, 0, 0, 0))
+        image = Image.alpha_composite(background, image)
+    else:
+        image = remove_background(image, rembg_session, force_remove=True)
+    image = do_resize_content(image, foreground_ratio)
+    image = expand_to_square(image)
+    image = add_background(image, backgroud_color)
+    return image.convert("RGB")
+def gen_image(input_image, seed, scale, step):
+    global pipeline, model, args
+    pipeline.set_seed(seed)
+    rt_dict = pipeline(input_image, scale=scale, step=step)
+    stage1_images = rt_dict["stage1_images"]
+    stage2_images = rt_dict["stage2_images"]
+    np_imgs = np.concatenate(stage1_images, 1)
+    np_xyzs = np.concatenate(stage2_images, 1)
+    glb_path, obj_path = generate3d(model, np_imgs, np_xyzs, args.device)
+    return Image.fromarray(np_imgs), Image.fromarray(np_xyzs), glb_path, obj_path
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    "--stage1_config",
+    type=str,
+    default="configs/nf7_v3_SNR_rd_size_stroke.yaml",
+    help="config for stage1",
+)
+parser.add_argument(
+    "--stage2_config",
+    type=str,
+    default="configs/stage2-v2-snr.yaml",
+    help="config for stage2",
+)
+parser.add_argument("--device", type=str, default="cuda")
+args = parser.parse_args()
+crm_path = hf_hub_download(repo_id="Zhengyi/CRM", filename="CRM.pth")
+specs = json.load(open("configs/specs_objaverse_total.json"))
+model = CRM(specs).to(args.device)
+model.load_state_dict(torch.load(crm_path, map_location = args.device), strict=False)
+stage1_config = OmegaConf.load(args.stage1_config).config
+stage2_config = OmegaConf.load(args.stage2_config).config
+stage2_sampler_config = stage2_config.sampler
+stage1_sampler_config = stage1_config.sampler
+stage1_model_config = stage1_config.models
+stage2_model_config = stage2_config.models
+xyz_path = hf_hub_download(repo_id="Zhengyi/CRM", filename="ccm-diffusion.pth")
+pixel_path = hf_hub_download(repo_id="Zhengyi/CRM", filename="pixel-diffusion.pth")
+stage1_model_config.resume = pixel_path
+stage2_model_config.resume = xyz_path
+pipeline = TwoStagePipeline(
+    stage1_model_config,
+    stage2_model_config,
+    stage1_sampler_config,
+    stage2_sampler_config,
+    device=args.device,
+    dtype=torch.float16
+)
+with gr.Blocks() as demo:
+    gr.Markdown("# CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model")
+    with gr.Row():
+        with gr.Column():
+            with gr.Row():
+                image_input = gr.Image(
+                    label="Image input",
+                    image_mode="RGBA",
+                    sources="upload",
+                    type="pil",
+                )
+                processed_image = gr.Image(label="Processed Image", interactive=False, type="pil", image_mode="RGB")
+            with gr.Row():
+                with gr.Column():
+                    with gr.Row():
+                        background_choice = gr.Radio([
+                                "Alpha as mask",
+                                "Auto Remove background"
+                            ], value="Auto Remove background",
+                            label="backgroud choice")
+                        # do_remove_background = gr.Checkbox(label=, value=True)
+                        # force_remove = gr.Checkbox(label=, value=False)
+                    back_groud_color = gr.ColorPicker(label="Background Color", value="#7F7F7F", interactive=False)
+                    foreground_ratio = gr.Slider(
+                        label="Foreground Ratio",
+                        minimum=0.5,
+                        maximum=1.0,
+                        value=1.0,
+                        step=0.05,
+                    )
+                with gr.Column():
+                    seed = gr.Number(value=1234, label="seed", precision=0)
+                    guidance_scale = gr.Number(value=5.5, minimum=3, maximum=10, label="guidance_scale")
+                    step = gr.Number(value=50, minimum=30, maximum=100, label="sample steps", precision=0)
+            text_button = gr.Button("Generate 3D shape")
+            gr.Examples(
+                examples=[os.path.join("examples", i) for i in os.listdir("examples")],
+                inputs=[image_input],
+            )
+        with gr.Column():
+            image_output = gr.Image(interactive=False, label="Output RGB image")
+            xyz_ouput = gr.Image(interactive=False, label="Output CCM image")
+            output_model = gr.Model3D(
+                label="Output GLB",
+                interactive=False,
+            )
+            gr.Markdown("Note: The GLB model shown here has a darker lighting and enlarged UV seams. Download for correct results.")
+            output_obj = gr.File(interactive=False, label="Output OBJ")
+    inputs = [
+        processed_image,
+        seed,
+        guidance_scale,
+        step,
+    ]
+    outputs = [
+        image_output,
+        xyz_ouput,
+        output_model,
+        output_obj,
+    ]
+    text_button.click(fn=check_input_image, inputs=[image_input]).success(
+        fn=preprocess_image,
+        inputs=[image_input, background_choice, foreground_ratio, back_groud_color],
+        outputs=[processed_image],
+    ).success(
+        fn=gen_image,
+        inputs=inputs,
+        outputs=outputs,
+    )
+    demo.queue().launch()