File size: 10,538 Bytes

---
language:
- en
metrics:
- precision
pipeline_tag: image-segmentation
tags:
- Trasnformer
- CAM
---
# CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation

**Official PyTorch Implementation**

This is a PyTorch/GPU implementation of the paper [CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation](https://arxiv.org/abs/2503.15617)

```
@article{ahmed2025cam,
  title={CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation},
  author={Ahmed, Masud and Hasan, Zahid and Haque, Syed Arefinul and Faridee, Abu Zaher Md and Purushotham, Sanjay and You, Suya and Roy, Nirmalya},
  journal={arXiv preprint arXiv:2503.15617},
  year={2025}
}
```

GitHub Repo: [https://github.com/mahmed10/CAMSS](https://github.com/mahmed10/CAMSS)
## Abstract
Traditional transformer-based semantic segmentation relies on quantized embeddings. However, our analysis reveals that autoencoder accuracy on segmentation mask using quantized embeddings (e.g. VQ-VAE) is 8\% lower than continuous-valued embeddings  (e.g. KL-VAE). Motivated by this, we propose a continuous-valued embedding framework for semantic segmentation. By reformulating semantic mask generation as a continuous image-to-embedding diffusion process, our approach eliminates the need for discrete latent representations while preserving fine-grained spatial and semantic details. Our key contribution includes a diffusion-guided autoregressive transformer that learns a continuous semantic embedding space by modeling long-range dependencies in image features. Our framework contains a unified architecture combining a VAE encoder for continuous feature extraction, a diffusion-guided transformer for conditioned embedding generation, and a VAE decoder for semantic mask reconstruction. Our setting facilitates zero-shot domain adaptation capabilities enabled by the continuity of the embedding space. Experiments across diverse datasets (e.g., Cityscapes and domain-shifted variants) demonstrate state-of-the-art robustness to distribution shifts, including adverse weather (e.g., fog, snow) and viewpoint variations. Our model also exhibits strong noise resilience, achieving robust performance ($\approx$ 95\% AP compared to baseline) under gaussian noise, moderate motion blur, and moderate brightness/contrast variations, while experiencing only a moderate impact ($\approx$ 90\% AP compared to baseline) from 50\% salt and pepper noise, saturation and hue shifts.

## Result
Trained on Cityscape dataset and tested on SemanticKITTI, ACDC, CADEdgeTune dataset
<p align="center">
  <img src="demo/qualitative.png" width="720">
</p>

Quantitative results of semantic segmentation under various noise conditions
<p align="center">
    <table>
      <tr>
        <td align="center"><img src="demo/saltpepper_noise.png" width="200"/><br>Salt & Pepper Noise</td>
        <td align="center"><img src="demo/motion_blur.png" width="200"/><br>Motion Blur</td>
        <td align="center"><img src="demo/gaussian_noise.png" width="200"/><br>Gaussian Noise</td>
        <td align="center"><img src="demo/gaussian_blur.png" width="200"/><br>Gaussian Blur</td>
      </tr>
      <tr>
        <td align="center"><img src="demo/brightness.png" width="200"/><br>Brightness Variation</td>
        <td align="center"><img src="demo/contrast.png" width="200"/><br>Contrast Variation</td>
        <td align="center"><img src="demo/saturation.png" width="200"/><br>Saturation Variation</td>
        <td align="center"><img src="demo/hue.png" width="200"/><br>Hue Variation</td>
      </tr>
    </table>
</p>

## Prerequisite
To install the docker environment, first edit the `docker_env/Makefile`:
```
IMAGE=img_name/dl-aio
CONTAINER=containter_name
AVAILABLE_GPUS='0,1,2,3'
LOCAL_JUPYTER_PORT=18888
LOCAL_TENSORBOARD_PORT=18006
PASSWORD=yourpassword
WORKSPACE=workspace_directory
```
- Edit the `img_name`, `containter_name`, `available_gpus`, `jupyter_port`, `tensorboard_port`, `password`, `workspace_directory`

1. For the first time run the following commands in terminal:
```
cd docker_env
make docker-build
make docker-run
```
2. or further use to docker environment
- To stop the environmnet: `make docker-stop`
- To resume the environmente: `make docker-resume`

For coding open a web browser `ip_address:jupyter_port` e.g.,`http://localhost:18888`

## Dataset
Four Dataset is used in the work
1. [Cityscapes Dataset](https://www.cityscapes-dataset.com/)
2. [KITTI Dataset](https://www.cvlibs.net/datasets/kitti/eval_step.php)
3. [ACDC Dataset](https://acdc.vision.ee.ethz.ch/)
4. [CAD-EdgeTune Dataset](https://ieee-dataport.org/documents/cad-edgetune)

**Modify the trainlist and vallist file to edit train and test split**

### Dataset structure
- Cityscapes Dataset
```
|-CityScapes
|----leftImg8bit 
|--------train
|------------aachen #contians the RGB images
|------------bochum #contians the RGB images
|................
|------------zurich #contians the RGB images
|--------val
|................
|----gtFine 
|--------train
|------------aachen #contians the RGB images #contains semantic segmentation labels
|------------bochum #contians the RGB images #contains semantic segmentation labels
|................
|------------zurich #contians the RGB images #contains semantic segmentation labels
|--------val
|................
|----trainlist.txt #image list used for training
|----vallist.txt #image list used for testing
|----cityscape.yaml #configuration file for CityScapes dataset
```

- ACDC Dataset
```
|-ACDC
|----rgb_anon 
|--------fog
|------------train
|----------------GOPR0475 #contians the RGB images
|----------------GOPR0476 #contians the RGB images
|................
|----------------GP020478 #contians the RGB images
|------------val
|................
|--------rain
|................
|--------snow
|................
|----gt 
|--------fog
|------------train
|----------------GOPR0475 #contains semantic segmentation labels
|----------------GOPR0476 #contains semantic segmentation labels
|................
|----------------GP020478 #contains semantic segmentation labels
|------------val
|................
|--------rain
|................
|--------snow
|................
|----vallist_fog.txt #image list used for testing fog data
|----vallist_rain.txt #image list used for testing rain data
|----vallist_snow.txt #image list used for testing snow data
|----acdc.yaml #configuration file for ACDC dataset
```

- SemanticKitti Dataset
```
|-SemanticKitti
|----training 
|--------image_02
|------------0000 #contians the RGB images
|------------0001 #contians the RGB images
|................
|------------0020 #contians the RGB images
|----kitti-step
|--------panoptic_maps
|------------train
|----------------0000 #contains semantic segmentation labels
|----------------0001 #contains semantic segmentation labels
|................
|----------------0020 #contains semantic segmentation labels
|------------val
|................
|----trainlist.txt #image list used for training
|----vallist.txt #image list used for testing
|----semantickitti.yaml #configuration file for SemanticKitti dataset
```

- CADEdgeTune Dataset
```
|-CADEdgeTune
|----SEQ1
|--------Images #contians the RGB images
|--------LabelMasks #contains semantic segmentation labels
|----SEQ2
|--------Images #contians the RGB images
|--------LabelMasks #contains semantic segmentation labels
|................
|----SEQ17
|----all.txt #image list complete
|----trainlist.txt #image list used for training
|----vallist.txt #image list used for testing
|----cadedgetune.yaml #configuration file for CADEdgeTune dataset
```


## Weights
To download the pretrained weights please visit [Hugging Face Repo](https://huggingface.co/mahmed10/CAM-Seg)
- **LDM model** Pretrained model from Rombach et al.'s Latent Diffusion Models is used [Link](https://huggingface.co/mahmed10/CAM-Seg/resolve/main/pretrained_models/vae/modelf16.ckpt)
- **MAR model** Following mar model is used

|Training Data|Model|Params|Link|
|-------------|-----|------|----|
|Cityscapes | Mar-base| 217M|[link](https://huggingface.co/mahmed10/CAM-Seg/resolve/main/pretrained_models/mar/city768.16.pth)|


Download this weight files and organize as follow
```
|-pretrained_models
|----mar
|--------city768.16.pth
|----vae
|--------modelf16.ckpt
```

**Alternative code to automatically download pretrain weights**
```
import os
import requests

# Define URLs and file paths
files_to_download = {
    "https://huggingface.co/mahmed10/CAM-Seg/resolve/main/pretrained_models/vae/modelf16.ckpt":
        "pretrained_models/vae/modelf16.ckpt",
    "https://huggingface.co/mahmed10/CAM-Seg/resolve/main/pretrained_models/mar/city768.16.pth":
        "pretrained_models/mar/city768.16.pth"
}

for url, path in files_to_download.items():
    os.makedirs(os.path.dirname(path), exist_ok=True)

    print(f"Downloading from {url}...")
    response = requests.get(url, stream=True)
    if response.status_code == 200:
        with open(path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        print(f"Saved to {path}")
    else:
        print(f"Failed to download from {url}, status code {response.status_code}")
```

## Validation
Open the `validation.ipnyb` file

Edit the **Block 6** to select which dataset is to use for validation

```
dataset_train = cityscapes.CityScapes('dataset/CityScapes/vallist.txt', data_set= 'val', transform=transform_train,seed=36, img_size=768)
# dataset_train = umbc.UMBC('dataset/UMBC/all.txt', data_set= 'val', transform=transform_train,seed=36, img_size=768)
# dataset_train = acdc.ACDC('dataset/ACDC/vallist_fog.txt', data_set= 'val', transform=transform_train,seed=36, img_size=768)
# dataset_train = semantickitti.SemanticKITTI('dataset/SemanticKitti/vallist.txt', data_set= 'val', transform=transform_train, seed=36, img_size=768)
```

Run all the blocks

## Training

### From Scratch

Run the following code in terminal
```
torchrun --nproc_per_node=4 train.py
```

it will save checkpoint in `output_dir/year.month.day.hour.min` folder, for e.g. `output_dir/2025.05.09.02.27`

### Resume Training

Run the following code in terminal
```
torchrun --nproc_per_node=4 train.py --resume year.month.day.hour.min
```

Here is an example code
```
torchrun --nproc_per_node=4 train.py --resume 2025.05.09.02.27
```

## Acknowlegement
The code is developed on top following codework
1. [latent-diffusion](https://github.com/CompVis/latent-diffusion)
2. [mar](https://github.com/LTH14/mar)