|
# InternImage for Semantic Segmentation |
|
|
|
This folder contains the implementation of the InternImage for semantic segmentation. |
|
|
|
Our segmentation code is developed on top of [MMSegmentation v0.27.0](https://github.com/open-mmlab/mmsegmentation/tree/v0.27.0). |
|
|
|
<!-- TOC --> |
|
|
|
- [Installation](#installation) |
|
- [Data Preparation](#data-preparation) |
|
- [Released Models](#released-models) |
|
- [Evaluation](#evaluation) |
|
- [Training](#training) |
|
- [Manage Jobs with Slurm](#manage-jobs-with-slurm) |
|
- [Image Demo](#image-demo) |
|
- [Export](#export) |
|
|
|
<!-- TOC --> |
|
|
|
## Installation |
|
|
|
- Clone this repository: |
|
|
|
```bash |
|
git clone https://github.com/OpenGVLab/InternImage.git |
|
cd InternImage |
|
``` |
|
|
|
- Create a conda virtual environment and activate it: |
|
|
|
```bash |
|
conda create -n internimage python=3.9 |
|
conda activate internimage |
|
``` |
|
|
|
- Install `CUDA>=10.2` with `cudnn>=7` following |
|
the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) |
|
- Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`: |
|
|
|
For examples, to install `torch==1.11` with `CUDA==11.3`: |
|
|
|
```bash |
|
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html |
|
``` |
|
|
|
- Install other requirements: |
|
|
|
note: conda opencv will break torchvision as not to support GPU, so we need to install opencv using pip. |
|
|
|
```bash |
|
conda install -c conda-forge termcolor yacs pyyaml scipy pip -y |
|
pip install opencv-python |
|
``` |
|
|
|
- Install `timm`, `mmcv-full` and \`mmsegmentation': |
|
|
|
```bash |
|
pip install -U openmim |
|
mim install mmcv-full==1.5.0 |
|
mim install mmsegmentation==0.27.0 |
|
pip install timm==0.6.11 mmdet==2.28.1 |
|
``` |
|
|
|
- Install other requirements: |
|
|
|
```bash |
|
pip install opencv-python termcolor yacs pyyaml scipy |
|
# Please use a version of numpy lower than 2.0 |
|
pip install numpy==1.26.4 |
|
pip install pydantic==1.10.13 |
|
``` |
|
|
|
- Compile CUDA operators |
|
|
|
Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch. |
|
|
|
```bash |
|
cd ./ops_dcnv3 |
|
sh ./make.sh |
|
# unit test (should see all checking is True) |
|
python test.py |
|
``` |
|
|
|
- You can also install the operator using precompiled `.whl` files |
|
[DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files) |
|
|
|
## Data Preparation |
|
|
|
Prepare datasets according to the [guidelines](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#prepare-datasets) in MMSegmentation. |
|
|
|
## Released Models |
|
|
|
<details open> |
|
<summary> Dataset: ADE20K </summary> |
|
<br> |
|
<div> |
|
|
|
| method | backbone | resolution | mIoU (ss/ms) | #param | FLOPs | Config | Download | |
|
| :---------: | :------------: | :--------: | :----------: | :----: | :---: | :---------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | |
|
| UperNet | InternImage-T | 512x512 | 47.9 / 48.1 | 59M | 944G | [config](./configs/ade20k/upernet_internimage_t_512_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512_160k_ade20k.log.json) | |
|
| UperNet | InternImage-S | 512x512 | 50.1 / 50.9 | 80M | 1017G | [config](./configs/ade20k/upernet_internimage_s_512_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512_160k_ade20k.log.json) | |
|
| UperNet | InternImage-B | 512x512 | 50.8 / 51.3 | 128M | 1185G | [config](./configs/ade20k/upernet_internimage_b_512_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512_160k_ade20k.log.json) | |
|
| UperNet | InternImage-L | 640x640 | 53.9 / 54.1 | 256M | 2526G | [config](./configs/ade20k/upernet_internimage_l_640_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_640_160k_ade20k.log.json) | |
|
| UperNet | InternImage-XL | 640x640 | 55.0 / 55.3 | 368M | 3142G | [config](./configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_640_160k_ade20k.log.json) | |
|
| UperNet | InternImage-H | 896x896 | 59.9 / 60.3 | 1.12B | 3566G | [config](./configs/ade20k/upernet_internimage_h_896_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_h_896_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_h_896_160k_ade20k.log.json) | |
|
| Mask2Former | InternImage-H | 896x896 | 62.6 / 62.9 | 1.31B | 4635G | [config](./configs/ade20k/mask2former_internimage_h_896_80k_cocostuff2ade20k_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.log.json) | |
|
|
|
</div> |
|
|
|
</details> |
|
|
|
<details> |
|
<summary> Dataset: Cityscapes </summary> |
|
<br> |
|
<div> |
|
|
|
| method | backbone | resolution | mIoU (ss/ms) | #params | FLOPs | Config | Download | |
|
| :-----------: | :------------: | :--------: | :-----------: | :-----: | :---: | :--------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | |
|
| UperNet | InternImage-T | 512x1024 | 82.58 / 83.40 | 59M | 1889G | [config](./configs/cityscapes/upernet_internimage_t_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512x1024_160k_cityscapes.log.json) | |
|
| UperNet | InternImage-S | 512x1024 | 82.74 / 83.45 | 80M | 2035G | [config](./configs/cityscapes/upernet_internimage_s_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512x1024_160k_cityscapes.log.json) | |
|
| UperNet | InternImage-B | 512x1024 | 83.18 / 83.97 | 128M | 2369G | [config](./configs/cityscapes/upernet_internimage_b_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512x1024_160k_cityscapes.log.json) | |
|
| UperNet | InternImage-L | 512x1024 | 83.68 / 84.41 | 256M | 3234G | [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_cityscapes.log.json) | |
|
| UperNet\* | InternImage-L | 512x1024 | 85.94 / 86.22 | 256M | 3234G | [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.log.json) | |
|
| UperNet | InternImage-XL | 512x1024 | 83.62 / 84.28 | 368M | 4022G | [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_cityscapes.log.json) | |
|
| UperNet\* | InternImage-XL | 512x1024 | 86.20 / 86.42 | 368M | 4022G | [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) | |
|
| SegFormer\* | InternImage-L | 512x1024 | 85.16 / 85.67 | 220M | 1580G | [config](./configs/cityscapes/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json) | |
|
| SegFormer\* | InternImage-XL | 512x1024 | 85.41 / 85.93 | 330M | 2364G | [config](./configs/cityscapes/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) | |
|
| Mask2Former\* | InternImage-H | 1024x1024 | 86.37 / 86.96 | 1094M | 7878G | [config](./configs/cityscapes/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.log.json) | |
|
|
|
\* denotes the model is trained using extra Mapillary dataset. |
|
|
|
</div> |
|
|
|
</details> |
|
|
|
<details> |
|
<summary> Dataset: COCO-Stuff-164K </summary> |
|
<br> |
|
<div> |
|
|
|
| method | backbone | resolution | mIoU (ss/ms) | #params | FLOPs | Config | Download | |
|
| :---------: | :-----------: | :--------: | :----------: | :-----: | :---: | :-----------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | |
|
| Mask2Former | InternImage-H | 896x896 | 52.6 / 52.8 | 1.31B | 4635G | [config](./configs/coco_stuff164k/mask2former_internimage_h_896_80k_cocostuff164k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff164k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff164k.log.json) | |
|
|
|
</div> |
|
|
|
</details> |
|
|
|
<details> |
|
<summary> Dataset: COCO-Stuff-10K </summary> |
|
<br> |
|
<div> |
|
|
|
| method | backbone | resolution | mIoU (ss/ms) | #params | FLOPs | Config | Download | |
|
| :---------: | :-----------: | :--------: | :----------: | :-----: | :---: | :-----------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | |
|
| Mask2Former | InternImage-H | 512x512 | 59.2 / 59.6 | 1.28B | 1528G | [config](./configs/coco_stuff10k/mask2former_internimage_h_512_40k_cocostuff164k_to_10k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_512_40k_cocostuff164k_to_10k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_512_40k_cocostuff164k_to_10k.log.json) | |
|
|
|
</div> |
|
|
|
</details> |
|
|
|
<details> |
|
<summary> Dataset: Pascal-Context-59 </summary> |
|
<br> |
|
<div> |
|
|
|
| method | backbone | resolution | mIoU (ss/ms) | #param | FLOPs | Config | Download | |
|
| :---------: | :-----------: | :--------: | :----------: | :----: | :---: | :---------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | |
|
| Mask2Former | InternImage-H | 480x480 | 69.7 / 70.3 | 1.07B | 867G | [config](./configs/pascal_context/mask2former_internimage_h_480_40k_pascal_context_59.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_480_40k_pascal_context_59.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_480_40k_pascal_context_59.log.json) | |
|
|
|
</div> |
|
|
|
</details> |
|
|
|
<details> |
|
<summary> Dataset: NYU-Depth-V2 </summary> |
|
<br> |
|
<div> |
|
|
|
| method | backbone | resolution | mIoU (ss/ms) | #param | FLOPs | Config | Download | |
|
| :---------: | :-----------: | :--------: | :----------: | :----: | :---: | :-----------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | |
|
| Mask2Former | InternImage-H | 480x480 | 67.1 / 68.1 | 1.07B | 867G | [config](./configs/nyu_depth_v2/mask2former_internimage_h_480_40k_nyu.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_480_40k_nyu.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_480_40k_nyu.log.json) | |
|
|
|
</div> |
|
|
|
</details> |
|
|
|
<details> |
|
<summary> Dataset: Mapillary </summary> |
|
<br> |
|
<div> |
|
|
|
| method | backbone | resolution | #param | FLOPs | Config | Download | |
|
| :---------: | :------------: | :--------: | :----: | :---: | :------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | |
|
| UperNet | InternImage-L | 512x1024 | 256M | 3234G | [config](./configs/mapillary/upernet_internimage_l_512x1024_80k_mapillary.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_80k_mapillary.pth) | |
|
| UperNet | InternImage-XL | 512x1024 | 368M | 4022G | [config](./configs/mapillary/upernet_internimage_xl_512x1024_80k_mapillary.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_80k_mapillary.pth) | |
|
| SegFormer | InternImage-L | 512x1024 | 220M | 1580G | [config](./configs/mapillary/segformer_internimage_l_512x1024_80k_mapillary.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_80k_mapillary.pth) | |
|
| SegFormer | InternImage-XL | 512x1024 | 330M | 2364G | [config](./configs/mapillary/segformer_internimage_xl_512x1024_80k_mapillary.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_80k_mapillary.pth) | |
|
| Mask2Former | InternImage-H | 896x896 | 1094M | 7878G | [config](./configs/mapillary/mask2former_internimage_h_896x896_80k_mapillary.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896x896_80k_mapillary.pth) | |
|
|
|
</div> |
|
|
|
</details> |
|
|
|
## Evaluation |
|
|
|
To evaluate our `InternImage` on ADE20K val, run: |
|
|
|
```bash |
|
sh dist_test.sh <config-file> <checkpoint> <gpu-num> --eval mIoU |
|
``` |
|
|
|
For example, to evaluate the `InternImage-T` with a single GPU: |
|
|
|
```bash |
|
python test.py configs/ade20k/upernet_internimage_t_512_160k_ade20k.py pretrained/upernet_internimage_t_512_160k_ade20k.pth --eval mIoU |
|
``` |
|
|
|
For example, to evaluate the `InternImage-B` with a single node with 8 GPUs: |
|
|
|
```bash |
|
sh dist_test.sh configs/ade20k/upernet_internimage_b_512_160k_ade20k.py pretrained/upernet_internimage_b_512_160k_ade20k.pth 8 --eval mIoU |
|
``` |
|
|
|
## Training |
|
|
|
To train an `InternImage` on ADE20K, run: |
|
|
|
```bash |
|
sh dist_train.sh <config-file> <gpu-num> |
|
``` |
|
|
|
For example, to train `InternImage-T` with 8 GPU on 1 node (total batch size 16), run: |
|
|
|
```bash |
|
sh dist_train.sh configs/ade20k/upernet_internimage_t_512_160k_ade20k.py 8 |
|
``` |
|
|
|
## Manage Jobs with Slurm |
|
|
|
For example, to train `InternImage-XL` with 8 GPU on 1 node (total batch size 16), run: |
|
|
|
```bash |
|
GPUS=8 sh slurm_train.sh <partition> <job-name> configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py |
|
``` |
|
|
|
## Image Demo |
|
|
|
To inference a single/multiple image like this. |
|
If you specify image containing directory instead of a single image, it will process all the images in the directory. |
|
|
|
``` |
|
CUDA_VISIBLE_DEVICES=0 python image_demo.py \ |
|
data/ade/ADEChallengeData2016/images/validation/ADE_val_00000591.jpg \ |
|
configs/ade20k/upernet_internimage_t_512_160k_ade20k.py \ |
|
checkpoint_dir/seg/upernet_internimage_t_512_160k_ade20k.pth \ |
|
--palette ade20k |
|
``` |
|
|
|
## Export |
|
|
|
Install `mmdeploy` at first: |
|
|
|
```shell |
|
pip install mmdeploy==0.14.0 |
|
``` |
|
|
|
To export a segmentation model from PyTorch to TensorRT, run: |
|
|
|
```shell |
|
MODEL="model_name" |
|
CKPT_PATH="/path/to/model/ckpt.pth" |
|
|
|
python deploy.py \ |
|
"./deploy/configs/mmseg/segmentation_tensorrt_static-512x512.py" \ |
|
"./configs/ade20k/${MODEL}.py" \ |
|
"${CKPT_PATH}" \ |
|
"./deploy/demo.png" \ |
|
--work-dir "./work_dirs/mmseg/${MODEL}" \ |
|
--device cuda \ |
|
--dump-info |
|
``` |
|
|
|
For example, to export `upernet_internimage_t_512_160k_ade20k` from PyTorch to TensorRT, run: |
|
|
|
```shell |
|
MODEL="upernet_internimage_t_512_160k_ade20k" |
|
CKPT_PATH="/path/to/model/ckpt/upernet_internimage_t_512_160k_ade20k.pth" |
|
|
|
python deploy.py \ |
|
"./deploy/configs/mmseg/segmentation_tensorrt_static-512x512.py" \ |
|
"./configs/ade20k/${MODEL}.py" \ |
|
"${CKPT_PATH}" \ |
|
"./deploy/demo.png" \ |
|
--work-dir "./work_dirs/mmseg/${MODEL}" \ |
|
--device cuda \ |
|
--dump-info |
|
``` |
|
|