ablattmann
		
	commited on
		
		
					Commit 
							
							·
						
						ec3a273
	
1
								Parent(s):
							
							ebcf159
								
add configs for training unconditional/class-conditional ldms
Browse files- README.md +85 -15
- configs/latent-diffusion/celebahq-ldm-vq-4.yaml +86 -0
- configs/latent-diffusion/cin-ldm-vq-f8.yaml +98 -0
- configs/latent-diffusion/ffhq-ldm-vq-4.yaml +85 -0
- configs/latent-diffusion/lsun_bedrooms-ldm-vq-4.yaml +85 -0
- configs/latent-diffusion/{lsun_churches_f8-autoencoder-ldm.yaml → lsun_churches-ldm-kl-8.yaml} +3 -7
- ldm/models/diffusion/ddim.py +4 -5
- ldm/models/diffusion/ddpm.py +37 -22
- ldm/modules/diffusionmodules/util.py +6 -0
- main.py +4 -1
- models/ldm/semantic_synthesis256/config.yaml +59 -0
- scripts/download_first_stages.sh +2 -2
- scripts/download_models.sh +8 -1
    	
        README.md
    CHANGED
    
    | @@ -55,18 +55,7 @@ bash scripts/download_first_stages.sh | |
| 55 | 
             
            ```
         | 
| 56 |  | 
| 57 | 
             
            The first stage models can then be found in `models/first_stage_models/<model_spec>`
         | 
| 58 | 
            -
            ### Training autoencoder models
         | 
| 59 |  | 
| 60 | 
            -
            Configs for training a KL-regularized autoencoder on ImageNet are provided at `configs/autoencoder`.
         | 
| 61 | 
            -
            Training can be started by running
         | 
| 62 | 
            -
            ```
         | 
| 63 | 
            -
            CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/autoencoder/<config_spec> -t --gpus 0,    
         | 
| 64 | 
            -
            ```
         | 
| 65 | 
            -
            where `config_spec` is one of {`autoencoder_kl_8x8x64.yaml`(f=32, d=64), `autoencoder_kl_16x16x16.yaml`(f=16, d=16), 
         | 
| 66 | 
            -
            `autoencoder_kl_32x32x4`(f=8, d=4), `autoencoder_kl_64x64x3`(f=4, d=3)}.
         | 
| 67 | 
            -
             | 
| 68 | 
            -
            For training VQ-regularized models, see the [taming-transformers](https://github.com/CompVis/taming-transformers) 
         | 
| 69 | 
            -
            repository.
         | 
| 70 |  | 
| 71 |  | 
| 72 | 
             
            ## Pretrained LDMs
         | 
| @@ -78,9 +67,10 @@ repository. | |
| 78 | 
             
            | LSUN-Bedrooms                   | Unconditional Image Synthesis   |  LDM-VQ-4 (200 DDIM steps, eta=1)| 2.95 (3.0)          | 2.22 (2.23)| 0.66 | 0.48 | https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip |                                                 |  
         | 
| 79 | 
             
            | ImageNet                        | Class-conditional Image Synthesis | LDM-VQ-8 (200 DDIM steps, eta=1) | 7.77(7.76)* /15.82** | 201.56(209.52)* /78.82** | 0.84* / 0.65** | 0.35* / 0.63** |   https://ommer-lab.com/files/latent-diffusion/cin.zip                                                                   | *: w/ guiding, classifier_scale 10  **: w/o guiding, scores in bracket calculated with script provided by [ADM](https://github.com/openai/guided-diffusion) |   
         | 
| 80 | 
             
            | Conceptual Captions             |  Text-conditional Image Synthesis | LDM-VQ-f4 (100 DDIM steps, eta=0) | 16.79         | 13.89           | N/A | N/A |              https://ommer-lab.com/files/latent-diffusion/text2img.zip                                | finetuned from LAION                            |   
         | 
| 81 | 
            -
            | OpenImages                      | Super-resolution   |  | 
| 82 | 
             
            | OpenImages                      | Layout-to-Image Synthesis    | LDM-VQ-4 (200 DDIM steps, eta=0) | 32.02         | 15.92           | N/A    | N/A    |                  https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip                                           |                                                 | 
         | 
| 83 | 
            -
            | Landscapes       | 
|  | |
| 84 |  | 
| 85 |  | 
| 86 | 
             
            ### Get the models
         | 
| @@ -116,10 +106,90 @@ python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inp | |
| 116 | 
             
            `indir` should contain images `*.png` and masks `<image_fname>_mask.png` like
         | 
| 117 | 
             
            the examples provided in `data/inpainting_examples`.
         | 
| 118 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 119 | 
             
            ## Coming Soon...
         | 
| 120 |  | 
| 121 | 
            -
            *  | 
| 122 | 
            -
            * Inference scripts for conditional LDMs for various conditioning modalities.
         | 
| 123 | 
             
            * In the meantime, you can play with our colab notebook https://colab.research.google.com/drive/1xqzUi2iXQXDqXBHQGP9Mqt2YrYW6cx-J?usp=sharing
         | 
| 124 | 
             
            * We will also release some further pretrained models.
         | 
| 125 |  | 
|  | |
| 55 | 
             
            ```
         | 
| 56 |  | 
| 57 | 
             
            The first stage models can then be found in `models/first_stage_models/<model_spec>`
         | 
|  | |
| 58 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 59 |  | 
| 60 |  | 
| 61 | 
             
            ## Pretrained LDMs
         | 
|  | |
| 67 | 
             
            | LSUN-Bedrooms                   | Unconditional Image Synthesis   |  LDM-VQ-4 (200 DDIM steps, eta=1)| 2.95 (3.0)          | 2.22 (2.23)| 0.66 | 0.48 | https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip |                                                 |  
         | 
| 68 | 
             
            | ImageNet                        | Class-conditional Image Synthesis | LDM-VQ-8 (200 DDIM steps, eta=1) | 7.77(7.76)* /15.82** | 201.56(209.52)* /78.82** | 0.84* / 0.65** | 0.35* / 0.63** |   https://ommer-lab.com/files/latent-diffusion/cin.zip                                                                   | *: w/ guiding, classifier_scale 10  **: w/o guiding, scores in bracket calculated with script provided by [ADM](https://github.com/openai/guided-diffusion) |   
         | 
| 69 | 
             
            | Conceptual Captions             |  Text-conditional Image Synthesis | LDM-VQ-f4 (100 DDIM steps, eta=0) | 16.79         | 13.89           | N/A | N/A |              https://ommer-lab.com/files/latent-diffusion/text2img.zip                                | finetuned from LAION                            |   
         | 
| 70 | 
            +
            | OpenImages                      | Super-resolution   | LDM-VQ-4     | N/A            | N/A               | N/A    | N/A    |                                    https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip                                    | BSR image degradation                           |
         | 
| 71 | 
             
            | OpenImages                      | Layout-to-Image Synthesis    | LDM-VQ-4 (200 DDIM steps, eta=0) | 32.02         | 15.92           | N/A    | N/A    |                  https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip                                           |                                                 | 
         | 
| 72 | 
            +
            | Landscapes      |  Semantic Image Synthesis   | LDM-VQ-4  | N/A             | N/A               | N/A    | N/A    |           https://ommer-lab.com/files/latent-diffusion/semantic_synthesis256.zip                                    |                                                 |
         | 
| 73 | 
            +
            | Landscapes       |  Semantic Image Synthesis   | LDM-VQ-4  | N/A             | N/A               | N/A    | N/A    |           https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip                                    |             finetuned on resolution 512x512                                     |
         | 
| 74 |  | 
| 75 |  | 
| 76 | 
             
            ### Get the models
         | 
|  | |
| 106 | 
             
            `indir` should contain images `*.png` and masks `<image_fname>_mask.png` like
         | 
| 107 | 
             
            the examples provided in `data/inpainting_examples`.
         | 
| 108 |  | 
| 109 | 
            +
             | 
| 110 | 
            +
            # Train your own LDMs
         | 
| 111 | 
            +
             | 
| 112 | 
            +
            ## Data preparation
         | 
| 113 | 
            +
             | 
| 114 | 
            +
            ### Faces 
         | 
| 115 | 
            +
            For downloading the CelebA-HQ and FFHQ datasets, proceed as described in the [taming-transformers](https://github.com/CompVis/taming-transformers#celeba-hq) 
         | 
| 116 | 
            +
            repository.
         | 
| 117 | 
            +
             | 
| 118 | 
            +
            ### LSUN 
         | 
| 119 | 
            +
             | 
| 120 | 
            +
            The LSUN datasets can be conveniently downloaded via the script available [here](https://github.com/fyu/lsun).
         | 
| 121 | 
            +
            We performed a custom split into training and validation images, and provide the corresponding filenames
         | 
| 122 | 
            +
            at [https://ommer-lab.com/files/lsun.zip](https://ommer-lab.com/files/lsun.zip). 
         | 
| 123 | 
            +
            After downloading, extract them to `./data/lsun`. The beds/cats/churches subsets should
         | 
| 124 | 
            +
            also be placed/symlinked at `./data/lsun/bedrooms`/`./data/lsun/cats`/`./data/lsun/churches`, respectively.
         | 
| 125 | 
            +
             | 
| 126 | 
            +
            ### ImageNet
         | 
| 127 | 
            +
            The code will try to download (through [Academic
         | 
| 128 | 
            +
            Torrents](http://academictorrents.com/)) and prepare ImageNet the first time it
         | 
| 129 | 
            +
            is used. However, since ImageNet is quite large, this requires a lot of disk
         | 
| 130 | 
            +
            space and time. If you already have ImageNet on your disk, you can speed things
         | 
| 131 | 
            +
            up by putting the data into
         | 
| 132 | 
            +
            `${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/` (which defaults to
         | 
| 133 | 
            +
            `~/.cache/autoencoders/data/ILSVRC2012_{split}/data/`), where `{split}` is one
         | 
| 134 | 
            +
            of `train`/`validation`. It should have the following structure:
         | 
| 135 | 
            +
             | 
| 136 | 
            +
            ```
         | 
| 137 | 
            +
            ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/
         | 
| 138 | 
            +
            ├── n01440764
         | 
| 139 | 
            +
            │   ├── n01440764_10026.JPEG
         | 
| 140 | 
            +
            │   ├── n01440764_10027.JPEG
         | 
| 141 | 
            +
            │   ├── ...
         | 
| 142 | 
            +
            ├── n01443537
         | 
| 143 | 
            +
            │   ├── n01443537_10007.JPEG
         | 
| 144 | 
            +
            │   ├── n01443537_10014.JPEG
         | 
| 145 | 
            +
            │   ├── ...
         | 
| 146 | 
            +
            ├── ...
         | 
| 147 | 
            +
            ```
         | 
| 148 | 
            +
             | 
| 149 | 
            +
            If you haven't extracted the data, you can also place
         | 
| 150 | 
            +
            `ILSVRC2012_img_train.tar`/`ILSVRC2012_img_val.tar` (or symlinks to them) into
         | 
| 151 | 
            +
            `${XDG_CACHE}/autoencoders/data/ILSVRC2012_train/` /
         | 
| 152 | 
            +
            `${XDG_CACHE}/autoencoders/data/ILSVRC2012_validation/`, which will then be
         | 
| 153 | 
            +
            extracted into above structure without downloading it again.  Note that this
         | 
| 154 | 
            +
            will only happen if neither a folder
         | 
| 155 | 
            +
            `${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/` nor a file
         | 
| 156 | 
            +
            `${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/.ready` exist. Remove them
         | 
| 157 | 
            +
            if you want to force running the dataset preparation again.
         | 
| 158 | 
            +
             | 
| 159 | 
            +
             | 
| 160 | 
            +
            ## Model Training
         | 
| 161 | 
            +
             | 
| 162 | 
            +
            Logs and checkpoints for trained models are saved to `logs/<START_DATE_AND_TIME>_<config_spec>`.
         | 
| 163 | 
            +
             | 
| 164 | 
            +
            ### Training autoencoder models
         | 
| 165 | 
            +
             | 
| 166 | 
            +
            Configs for training a KL-regularized autoencoder on ImageNet are provided at `configs/autoencoder`.
         | 
| 167 | 
            +
            Training can be started by running
         | 
| 168 | 
            +
            ```
         | 
| 169 | 
            +
            CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/autoencoder/<config_spec>.yaml -t --gpus 0,    
         | 
| 170 | 
            +
            ```
         | 
| 171 | 
            +
            where `config_spec` is one of {`autoencoder_kl_8x8x64`(f=32, d=64), `autoencoder_kl_16x16x16`(f=16, d=16), 
         | 
| 172 | 
            +
            `autoencoder_kl_32x32x4`(f=8, d=4), `autoencoder_kl_64x64x3`(f=4, d=3)}.
         | 
| 173 | 
            +
             | 
| 174 | 
            +
            For training VQ-regularized models, see the [taming-transformers](https://github.com/CompVis/taming-transformers) 
         | 
| 175 | 
            +
            repository.
         | 
| 176 | 
            +
             | 
| 177 | 
            +
            ### Training LDMs 
         | 
| 178 | 
            +
             | 
| 179 | 
            +
            In ``configs/latent-diffusion/`` we provide configs for training LDMs on the LSUN-, CelebA-HQ, FFHQ and ImageNet datasets. 
         | 
| 180 | 
            +
            Training can be started by running
         | 
| 181 | 
            +
             | 
| 182 | 
            +
            ```shell script
         | 
| 183 | 
            +
            CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/latent-diffusion/<config_spec>.yaml -t --gpus 0,
         | 
| 184 | 
            +
            ``` 
         | 
| 185 | 
            +
             | 
| 186 | 
            +
            where ``<config_spec>`` is one of {`celebahq-ldm-vq-4`(f=4, VQ-reg. autoencoder, spatial size 64x64x3),`ffhq-ldm-vq-4`(f=4, VQ-reg. autoencoder, spatial size 64x64x3),
         | 
| 187 | 
            +
            `lsun_bedrooms-ldm-vq-4`(f=4, VQ-reg. autoencoder, spatial size 64x64x3),
         | 
| 188 | 
            +
            `lsun_churches-ldm-vq-4`(f=8, KL-reg. autoencoder, spatial size 32x32x4),`cin-ldm-vq-8`(f=8, VQ-reg. autoencoder, spatial size 32x32x4)}.
         | 
| 189 | 
            +
             | 
| 190 | 
             
            ## Coming Soon...
         | 
| 191 |  | 
| 192 | 
            +
            * More inference scripts for conditional LDMs.
         | 
|  | |
| 193 | 
             
            * In the meantime, you can play with our colab notebook https://colab.research.google.com/drive/1xqzUi2iXQXDqXBHQGP9Mqt2YrYW6cx-J?usp=sharing
         | 
| 194 | 
             
            * We will also release some further pretrained models.
         | 
| 195 |  | 
    	
        configs/latent-diffusion/celebahq-ldm-vq-4.yaml
    ADDED
    
    | @@ -0,0 +1,86 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            model:
         | 
| 2 | 
            +
              base_learning_rate: 2.0e-06
         | 
| 3 | 
            +
              target: ldm.models.diffusion.ddpm.LatentDiffusion
         | 
| 4 | 
            +
              params:
         | 
| 5 | 
            +
                linear_start: 0.0015
         | 
| 6 | 
            +
                linear_end: 0.0195
         | 
| 7 | 
            +
                num_timesteps_cond: 1
         | 
| 8 | 
            +
                log_every_t: 200
         | 
| 9 | 
            +
                timesteps: 1000
         | 
| 10 | 
            +
                first_stage_key: image
         | 
| 11 | 
            +
                image_size: 64
         | 
| 12 | 
            +
                channels: 3
         | 
| 13 | 
            +
                monitor: val/loss_simple_ema
         | 
| 14 | 
            +
             | 
| 15 | 
            +
                unet_config:
         | 
| 16 | 
            +
                  target: ldm.modules.diffusionmodules.openaimodel.UNetModel
         | 
| 17 | 
            +
                  params:
         | 
| 18 | 
            +
                    image_size: 64
         | 
| 19 | 
            +
                    in_channels: 3
         | 
| 20 | 
            +
                    out_channels: 3
         | 
| 21 | 
            +
                    model_channels: 224
         | 
| 22 | 
            +
                    attention_resolutions:
         | 
| 23 | 
            +
                    # note: this isn\t actually the resolution but
         | 
| 24 | 
            +
                    # the downsampling factor, i.e. this corresnponds to
         | 
| 25 | 
            +
                    # attention on spatial resolution 8,16,32, as the
         | 
| 26 | 
            +
                    # spatial reolution of the latents is 64 for f4
         | 
| 27 | 
            +
                    - 8
         | 
| 28 | 
            +
                    - 4
         | 
| 29 | 
            +
                    - 2
         | 
| 30 | 
            +
                    num_res_blocks: 2
         | 
| 31 | 
            +
                    channel_mult:
         | 
| 32 | 
            +
                    - 1
         | 
| 33 | 
            +
                    - 2
         | 
| 34 | 
            +
                    - 3
         | 
| 35 | 
            +
                    - 4
         | 
| 36 | 
            +
                    num_head_channels: 32
         | 
| 37 | 
            +
                first_stage_config:
         | 
| 38 | 
            +
                  target: ldm.models.autoencoder.VQModelInterface
         | 
| 39 | 
            +
                  params:
         | 
| 40 | 
            +
                    embed_dim: 3
         | 
| 41 | 
            +
                    n_embed: 8192
         | 
| 42 | 
            +
                    ckpt_path: models/first_stage_models/vq-f4/model.ckpt
         | 
| 43 | 
            +
                    ddconfig:
         | 
| 44 | 
            +
                      double_z: false
         | 
| 45 | 
            +
                      z_channels: 3
         | 
| 46 | 
            +
                      resolution: 256
         | 
| 47 | 
            +
                      in_channels: 3
         | 
| 48 | 
            +
                      out_ch: 3
         | 
| 49 | 
            +
                      ch: 128
         | 
| 50 | 
            +
                      ch_mult:
         | 
| 51 | 
            +
                      - 1
         | 
| 52 | 
            +
                      - 2
         | 
| 53 | 
            +
                      - 4
         | 
| 54 | 
            +
                      num_res_blocks: 2
         | 
| 55 | 
            +
                      attn_resolutions: []
         | 
| 56 | 
            +
                      dropout: 0.0
         | 
| 57 | 
            +
                    lossconfig:
         | 
| 58 | 
            +
                      target: torch.nn.Identity
         | 
| 59 | 
            +
                cond_stage_config: __is_unconditional__
         | 
| 60 | 
            +
            data:
         | 
| 61 | 
            +
              target: main.DataModuleFromConfig
         | 
| 62 | 
            +
              params:
         | 
| 63 | 
            +
                batch_size: 48
         | 
| 64 | 
            +
                num_workers: 5
         | 
| 65 | 
            +
                wrap: false
         | 
| 66 | 
            +
                train:
         | 
| 67 | 
            +
                  target: taming.data.faceshq.CelebAHQTrain
         | 
| 68 | 
            +
                  params:
         | 
| 69 | 
            +
                    size: 256
         | 
| 70 | 
            +
                validation:
         | 
| 71 | 
            +
                  target: taming.data.faceshq.CelebAHQValidation
         | 
| 72 | 
            +
                  params:
         | 
| 73 | 
            +
                    size: 256
         | 
| 74 | 
            +
             | 
| 75 | 
            +
             | 
| 76 | 
            +
            lightning:
         | 
| 77 | 
            +
              callbacks:
         | 
| 78 | 
            +
                image_logger:
         | 
| 79 | 
            +
                  target: main.ImageLogger
         | 
| 80 | 
            +
                  params:
         | 
| 81 | 
            +
                    batch_frequency: 5000
         | 
| 82 | 
            +
                    max_images: 8
         | 
| 83 | 
            +
                    increase_log_steps: False
         | 
| 84 | 
            +
             | 
| 85 | 
            +
              trainer:
         | 
| 86 | 
            +
                benchmark: True
         | 
    	
        configs/latent-diffusion/cin-ldm-vq-f8.yaml
    ADDED
    
    | @@ -0,0 +1,98 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            model:
         | 
| 2 | 
            +
              base_learning_rate: 1.0e-06
         | 
| 3 | 
            +
              target: ldm.models.diffusion.ddpm.LatentDiffusion
         | 
| 4 | 
            +
              params:
         | 
| 5 | 
            +
                linear_start: 0.0015
         | 
| 6 | 
            +
                linear_end: 0.0195
         | 
| 7 | 
            +
                num_timesteps_cond: 1
         | 
| 8 | 
            +
                log_every_t: 200
         | 
| 9 | 
            +
                timesteps: 1000
         | 
| 10 | 
            +
                first_stage_key: image
         | 
| 11 | 
            +
                cond_stage_key: class_label
         | 
| 12 | 
            +
                image_size: 32
         | 
| 13 | 
            +
                channels: 4
         | 
| 14 | 
            +
                cond_stage_trainable: true
         | 
| 15 | 
            +
                conditioning_key: crossattn
         | 
| 16 | 
            +
                monitor: val/loss_simple_ema
         | 
| 17 | 
            +
                unet_config:
         | 
| 18 | 
            +
                  target: ldm.modules.diffusionmodules.openaimodel.UNetModel
         | 
| 19 | 
            +
                  params:
         | 
| 20 | 
            +
                    image_size: 32
         | 
| 21 | 
            +
                    in_channels: 4
         | 
| 22 | 
            +
                    out_channels: 4
         | 
| 23 | 
            +
                    model_channels: 256
         | 
| 24 | 
            +
                    attention_resolutions:
         | 
| 25 | 
            +
                    #note: this isn\t actually the resolution but
         | 
| 26 | 
            +
                    # the downsampling factor, i.e. this corresnponds to
         | 
| 27 | 
            +
                    # attention on spatial resolution 8,16,32, as the
         | 
| 28 | 
            +
                    # spatial reolution of the latents is 32 for f8
         | 
| 29 | 
            +
                    - 4
         | 
| 30 | 
            +
                    - 2
         | 
| 31 | 
            +
                    - 1
         | 
| 32 | 
            +
                    num_res_blocks: 2
         | 
| 33 | 
            +
                    channel_mult:
         | 
| 34 | 
            +
                    - 1
         | 
| 35 | 
            +
                    - 2
         | 
| 36 | 
            +
                    - 4
         | 
| 37 | 
            +
                    num_head_channels: 32
         | 
| 38 | 
            +
                    use_spatial_transformer: true
         | 
| 39 | 
            +
                    transformer_depth: 1
         | 
| 40 | 
            +
                    context_dim: 512
         | 
| 41 | 
            +
                first_stage_config:
         | 
| 42 | 
            +
                  target: ldm.models.autoencoder.VQModelInterface
         | 
| 43 | 
            +
                  params:
         | 
| 44 | 
            +
                    embed_dim: 4
         | 
| 45 | 
            +
                    n_embed: 16384
         | 
| 46 | 
            +
                    ckpt_path: configs/first_stage_models/vq-f8/model.yaml
         | 
| 47 | 
            +
                    ddconfig:
         | 
| 48 | 
            +
                      double_z: false
         | 
| 49 | 
            +
                      z_channels: 4
         | 
| 50 | 
            +
                      resolution: 256
         | 
| 51 | 
            +
                      in_channels: 3
         | 
| 52 | 
            +
                      out_ch: 3
         | 
| 53 | 
            +
                      ch: 128
         | 
| 54 | 
            +
                      ch_mult:
         | 
| 55 | 
            +
                      - 1
         | 
| 56 | 
            +
                      - 2
         | 
| 57 | 
            +
                      - 2
         | 
| 58 | 
            +
                      - 4
         | 
| 59 | 
            +
                      num_res_blocks: 2
         | 
| 60 | 
            +
                      attn_resolutions:
         | 
| 61 | 
            +
                      - 32
         | 
| 62 | 
            +
                      dropout: 0.0
         | 
| 63 | 
            +
                    lossconfig:
         | 
| 64 | 
            +
                      target: torch.nn.Identity
         | 
| 65 | 
            +
                cond_stage_config:
         | 
| 66 | 
            +
                  target: ldm.modules.encoders.modules.ClassEmbedder
         | 
| 67 | 
            +
                  params:
         | 
| 68 | 
            +
                    embed_dim: 512
         | 
| 69 | 
            +
                    key: class_label
         | 
| 70 | 
            +
            data:
         | 
| 71 | 
            +
              target: main.DataModuleFromConfig
         | 
| 72 | 
            +
              params:
         | 
| 73 | 
            +
                batch_size: 64
         | 
| 74 | 
            +
                num_workers: 12
         | 
| 75 | 
            +
                wrap: false
         | 
| 76 | 
            +
                train:
         | 
| 77 | 
            +
                  target: ldm.data.imagenet.ImageNetTrain
         | 
| 78 | 
            +
                  params:
         | 
| 79 | 
            +
                    config:
         | 
| 80 | 
            +
                      size: 256
         | 
| 81 | 
            +
                validation:
         | 
| 82 | 
            +
                  target: ldm.data.imagenet.ImageNetValidation
         | 
| 83 | 
            +
                  params:
         | 
| 84 | 
            +
                    config:
         | 
| 85 | 
            +
                      size: 256
         | 
| 86 | 
            +
             | 
| 87 | 
            +
             | 
| 88 | 
            +
            lightning:
         | 
| 89 | 
            +
              callbacks:
         | 
| 90 | 
            +
                image_logger:
         | 
| 91 | 
            +
                  target: main.ImageLogger
         | 
| 92 | 
            +
                  params:
         | 
| 93 | 
            +
                    batch_frequency: 5000
         | 
| 94 | 
            +
                    max_images: 8
         | 
| 95 | 
            +
                    increase_log_steps: False
         | 
| 96 | 
            +
             | 
| 97 | 
            +
              trainer:
         | 
| 98 | 
            +
                benchmark: True
         | 
    	
        configs/latent-diffusion/ffhq-ldm-vq-4.yaml
    ADDED
    
    | @@ -0,0 +1,85 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            model:
         | 
| 2 | 
            +
              base_learning_rate: 2.0e-06
         | 
| 3 | 
            +
              target: ldm.models.diffusion.ddpm.LatentDiffusion
         | 
| 4 | 
            +
              params:
         | 
| 5 | 
            +
                linear_start: 0.0015
         | 
| 6 | 
            +
                linear_end: 0.0195
         | 
| 7 | 
            +
                num_timesteps_cond: 1
         | 
| 8 | 
            +
                log_every_t: 200
         | 
| 9 | 
            +
                timesteps: 1000
         | 
| 10 | 
            +
                first_stage_key: image
         | 
| 11 | 
            +
                image_size: 64
         | 
| 12 | 
            +
                channels: 3
         | 
| 13 | 
            +
                monitor: val/loss_simple_ema
         | 
| 14 | 
            +
                unet_config:
         | 
| 15 | 
            +
                  target: ldm.modules.diffusionmodules.openaimodel.UNetModel
         | 
| 16 | 
            +
                  params:
         | 
| 17 | 
            +
                    image_size: 64
         | 
| 18 | 
            +
                    in_channels: 3
         | 
| 19 | 
            +
                    out_channels: 3
         | 
| 20 | 
            +
                    model_channels: 224
         | 
| 21 | 
            +
                    attention_resolutions:
         | 
| 22 | 
            +
                    # note: this isn\t actually the resolution but
         | 
| 23 | 
            +
                    # the downsampling factor, i.e. this corresnponds to
         | 
| 24 | 
            +
                    # attention on spatial resolution 8,16,32, as the
         | 
| 25 | 
            +
                    # spatial reolution of the latents is 64 for f4
         | 
| 26 | 
            +
                    - 8
         | 
| 27 | 
            +
                    - 4
         | 
| 28 | 
            +
                    - 2
         | 
| 29 | 
            +
                    num_res_blocks: 2
         | 
| 30 | 
            +
                    channel_mult:
         | 
| 31 | 
            +
                    - 1
         | 
| 32 | 
            +
                    - 2
         | 
| 33 | 
            +
                    - 3
         | 
| 34 | 
            +
                    - 4
         | 
| 35 | 
            +
                    num_head_channels: 32
         | 
| 36 | 
            +
                first_stage_config:
         | 
| 37 | 
            +
                  target: ldm.models.autoencoder.VQModelInterface
         | 
| 38 | 
            +
                  params:
         | 
| 39 | 
            +
                    embed_dim: 3
         | 
| 40 | 
            +
                    n_embed: 8192
         | 
| 41 | 
            +
                    ckpt_path: configs/first_stage_models/vq-f4/model.yaml
         | 
| 42 | 
            +
                    ddconfig:
         | 
| 43 | 
            +
                      double_z: false
         | 
| 44 | 
            +
                      z_channels: 3
         | 
| 45 | 
            +
                      resolution: 256
         | 
| 46 | 
            +
                      in_channels: 3
         | 
| 47 | 
            +
                      out_ch: 3
         | 
| 48 | 
            +
                      ch: 128
         | 
| 49 | 
            +
                      ch_mult:
         | 
| 50 | 
            +
                      - 1
         | 
| 51 | 
            +
                      - 2
         | 
| 52 | 
            +
                      - 4
         | 
| 53 | 
            +
                      num_res_blocks: 2
         | 
| 54 | 
            +
                      attn_resolutions: []
         | 
| 55 | 
            +
                      dropout: 0.0
         | 
| 56 | 
            +
                    lossconfig:
         | 
| 57 | 
            +
                      target: torch.nn.Identity
         | 
| 58 | 
            +
                cond_stage_config: __is_unconditional__
         | 
| 59 | 
            +
            data:
         | 
| 60 | 
            +
              target: main.DataModuleFromConfig
         | 
| 61 | 
            +
              params:
         | 
| 62 | 
            +
                batch_size: 42
         | 
| 63 | 
            +
                num_workers: 5
         | 
| 64 | 
            +
                wrap: false
         | 
| 65 | 
            +
                train:
         | 
| 66 | 
            +
                  target: taming.data.faceshq.FFHQTrain
         | 
| 67 | 
            +
                  params:
         | 
| 68 | 
            +
                    size: 256
         | 
| 69 | 
            +
                validation:
         | 
| 70 | 
            +
                  target: taming.data.faceshq.FFHQValidation
         | 
| 71 | 
            +
                  params:
         | 
| 72 | 
            +
                    size: 256
         | 
| 73 | 
            +
             | 
| 74 | 
            +
             | 
| 75 | 
            +
            lightning:
         | 
| 76 | 
            +
              callbacks:
         | 
| 77 | 
            +
                image_logger:
         | 
| 78 | 
            +
                  target: main.ImageLogger
         | 
| 79 | 
            +
                  params:
         | 
| 80 | 
            +
                    batch_frequency: 5000
         | 
| 81 | 
            +
                    max_images: 8
         | 
| 82 | 
            +
                    increase_log_steps: False
         | 
| 83 | 
            +
             | 
| 84 | 
            +
              trainer:
         | 
| 85 | 
            +
                benchmark: True
         | 
    	
        configs/latent-diffusion/lsun_bedrooms-ldm-vq-4.yaml
    ADDED
    
    | @@ -0,0 +1,85 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            model:
         | 
| 2 | 
            +
              base_learning_rate: 2.0e-06
         | 
| 3 | 
            +
              target: ldm.models.diffusion.ddpm.LatentDiffusion
         | 
| 4 | 
            +
              params:
         | 
| 5 | 
            +
                linear_start: 0.0015
         | 
| 6 | 
            +
                linear_end: 0.0195
         | 
| 7 | 
            +
                num_timesteps_cond: 1
         | 
| 8 | 
            +
                log_every_t: 200
         | 
| 9 | 
            +
                timesteps: 1000
         | 
| 10 | 
            +
                first_stage_key: image
         | 
| 11 | 
            +
                image_size: 64
         | 
| 12 | 
            +
                channels: 3
         | 
| 13 | 
            +
                monitor: val/loss_simple_ema
         | 
| 14 | 
            +
                unet_config:
         | 
| 15 | 
            +
                  target: ldm.modules.diffusionmodules.openaimodel.UNetModel
         | 
| 16 | 
            +
                  params:
         | 
| 17 | 
            +
                    image_size: 64
         | 
| 18 | 
            +
                    in_channels: 3
         | 
| 19 | 
            +
                    out_channels: 3
         | 
| 20 | 
            +
                    model_channels: 224
         | 
| 21 | 
            +
                    attention_resolutions:
         | 
| 22 | 
            +
                    # note: this isn\t actually the resolution but
         | 
| 23 | 
            +
                    # the downsampling factor, i.e. this corresnponds to
         | 
| 24 | 
            +
                    # attention on spatial resolution 8,16,32, as the
         | 
| 25 | 
            +
                    # spatial reolution of the latents is 64 for f4
         | 
| 26 | 
            +
                    - 8
         | 
| 27 | 
            +
                    - 4
         | 
| 28 | 
            +
                    - 2
         | 
| 29 | 
            +
                    num_res_blocks: 2
         | 
| 30 | 
            +
                    channel_mult:
         | 
| 31 | 
            +
                    - 1
         | 
| 32 | 
            +
                    - 2
         | 
| 33 | 
            +
                    - 3
         | 
| 34 | 
            +
                    - 4
         | 
| 35 | 
            +
                    num_head_channels: 32
         | 
| 36 | 
            +
                first_stage_config:
         | 
| 37 | 
            +
                  target: ldm.models.autoencoder.VQModelInterface
         | 
| 38 | 
            +
                  params:
         | 
| 39 | 
            +
                    ckpt_path: configs/first_stage_models/vq-f4/model.yaml
         | 
| 40 | 
            +
                    embed_dim: 3
         | 
| 41 | 
            +
                    n_embed: 8192
         | 
| 42 | 
            +
                    ddconfig:
         | 
| 43 | 
            +
                      double_z: false
         | 
| 44 | 
            +
                      z_channels: 3
         | 
| 45 | 
            +
                      resolution: 256
         | 
| 46 | 
            +
                      in_channels: 3
         | 
| 47 | 
            +
                      out_ch: 3
         | 
| 48 | 
            +
                      ch: 128
         | 
| 49 | 
            +
                      ch_mult:
         | 
| 50 | 
            +
                      - 1
         | 
| 51 | 
            +
                      - 2
         | 
| 52 | 
            +
                      - 4
         | 
| 53 | 
            +
                      num_res_blocks: 2
         | 
| 54 | 
            +
                      attn_resolutions: []
         | 
| 55 | 
            +
                      dropout: 0.0
         | 
| 56 | 
            +
                    lossconfig:
         | 
| 57 | 
            +
                      target: torch.nn.Identity
         | 
| 58 | 
            +
                cond_stage_config: __is_unconditional__
         | 
| 59 | 
            +
            data:
         | 
| 60 | 
            +
              target: main.DataModuleFromConfig
         | 
| 61 | 
            +
              params:
         | 
| 62 | 
            +
                batch_size: 48
         | 
| 63 | 
            +
                num_workers: 5
         | 
| 64 | 
            +
                wrap: false
         | 
| 65 | 
            +
                train:
         | 
| 66 | 
            +
                  target: ldm.data.lsun.LSUNBedroomsTrain
         | 
| 67 | 
            +
                  params:
         | 
| 68 | 
            +
                    size: 256
         | 
| 69 | 
            +
                validation:
         | 
| 70 | 
            +
                  target: ldm.data.lsun.LSUNBedroomsValidation
         | 
| 71 | 
            +
                  params:
         | 
| 72 | 
            +
                    size: 256
         | 
| 73 | 
            +
             | 
| 74 | 
            +
             | 
| 75 | 
            +
            lightning:
         | 
| 76 | 
            +
              callbacks:
         | 
| 77 | 
            +
                image_logger:
         | 
| 78 | 
            +
                  target: main.ImageLogger
         | 
| 79 | 
            +
                  params:
         | 
| 80 | 
            +
                    batch_frequency: 5000
         | 
| 81 | 
            +
                    max_images: 8
         | 
| 82 | 
            +
                    increase_log_steps: False
         | 
| 83 | 
            +
             | 
| 84 | 
            +
              trainer:
         | 
| 85 | 
            +
                benchmark: True
         | 
    	
        configs/latent-diffusion/{lsun_churches_f8-autoencoder-ldm.yaml → lsun_churches-ldm-kl-8.yaml}
    RENAMED
    
    | @@ -45,7 +45,7 @@ model: | |
| 45 | 
             
                  params:
         | 
| 46 | 
             
                    embed_dim: 4
         | 
| 47 | 
             
                    monitor: "val/rec_loss"
         | 
| 48 | 
            -
                    ckpt_path: "/ | 
| 49 | 
             
                    ddconfig:
         | 
| 50 | 
             
                      double_z: True
         | 
| 51 | 
             
                      z_channels: 4
         | 
| @@ -65,7 +65,7 @@ model: | |
| 65 | 
             
            data:
         | 
| 66 | 
             
              target: main.DataModuleFromConfig
         | 
| 67 | 
             
              params:
         | 
| 68 | 
            -
                batch_size:  | 
| 69 | 
             
                num_workers: 5
         | 
| 70 | 
             
                wrap: False
         | 
| 71 | 
             
                train:
         | 
| @@ -82,14 +82,10 @@ lightning: | |
| 82 | 
             
                image_logger:
         | 
| 83 | 
             
                  target: main.ImageLogger
         | 
| 84 | 
             
                  params:
         | 
| 85 | 
            -
                    batch_frequency:  | 
| 86 | 
             
                    max_images: 8
         | 
| 87 | 
             
                    increase_log_steps: False
         | 
| 88 |  | 
| 89 | 
            -
                metrics_over_trainsteps_checkpoint:
         | 
| 90 | 
            -
                  target: pytorch_lightning.callbacks.ModelCheckpoint
         | 
| 91 | 
            -
                  params:
         | 
| 92 | 
            -
                    every_n_train_steps: 20000
         | 
| 93 |  | 
| 94 | 
             
              trainer:
         | 
| 95 | 
             
                benchmark: True
         | 
|  | |
| 45 | 
             
                  params:
         | 
| 46 | 
             
                    embed_dim: 4
         | 
| 47 | 
             
                    monitor: "val/rec_loss"
         | 
| 48 | 
            +
                    ckpt_path: "models/first_stage_models/kl-f8/model.ckpt"
         | 
| 49 | 
             
                    ddconfig:
         | 
| 50 | 
             
                      double_z: True
         | 
| 51 | 
             
                      z_channels: 4
         | 
|  | |
| 65 | 
             
            data:
         | 
| 66 | 
             
              target: main.DataModuleFromConfig
         | 
| 67 | 
             
              params:
         | 
| 68 | 
            +
                batch_size: 96
         | 
| 69 | 
             
                num_workers: 5
         | 
| 70 | 
             
                wrap: False
         | 
| 71 | 
             
                train:
         | 
|  | |
| 82 | 
             
                image_logger:
         | 
| 83 | 
             
                  target: main.ImageLogger
         | 
| 84 | 
             
                  params:
         | 
| 85 | 
            +
                    batch_frequency: 5000
         | 
| 86 | 
             
                    max_images: 8
         | 
| 87 | 
             
                    increase_log_steps: False
         | 
| 88 |  | 
|  | |
|  | |
|  | |
|  | |
| 89 |  | 
| 90 | 
             
              trainer:
         | 
| 91 | 
             
                benchmark: True
         | 
    	
        ldm/models/diffusion/ddim.py
    CHANGED
    
    | @@ -5,8 +5,7 @@ import numpy as np | |
| 5 | 
             
            from tqdm import tqdm
         | 
| 6 | 
             
            from functools import partial
         | 
| 7 |  | 
| 8 | 
            -
            from ldm. | 
| 9 | 
            -
            from ldm.modules.diffusionmodules.util import make_ddim_sampling_parameters, make_ddim_timesteps
         | 
| 10 |  | 
| 11 |  | 
| 12 | 
             
            class DDIMSampler(object):
         | 
| @@ -27,8 +26,7 @@ class DDIMSampler(object): | |
| 27 | 
             
                                                              num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose)
         | 
| 28 | 
             
                    alphas_cumprod = self.model.alphas_cumprod
         | 
| 29 | 
             
                    assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep'
         | 
| 30 | 
            -
             | 
| 31 | 
            -
                    to_torch = partial(torch.tensor, dtype=torch.float32, device=self.model.device)
         | 
| 32 |  | 
| 33 | 
             
                    self.register_buffer('betas', to_torch(self.model.betas))
         | 
| 34 | 
             
                    self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))
         | 
| @@ -73,7 +71,8 @@ class DDIMSampler(object): | |
| 73 | 
             
                           corrector_kwargs=None,
         | 
| 74 | 
             
                           verbose=True,
         | 
| 75 | 
             
                           x_T=None,
         | 
| 76 | 
            -
                           log_every_t=100
         | 
|  | |
| 77 | 
             
                           ):
         | 
| 78 | 
             
                    if conditioning is not None:
         | 
| 79 | 
             
                        if isinstance(conditioning, dict):
         | 
|  | |
| 5 | 
             
            from tqdm import tqdm
         | 
| 6 | 
             
            from functools import partial
         | 
| 7 |  | 
| 8 | 
            +
            from ldm.modules.diffusionmodules.util import make_ddim_sampling_parameters, make_ddim_timesteps, noise_like
         | 
|  | |
| 9 |  | 
| 10 |  | 
| 11 | 
             
            class DDIMSampler(object):
         | 
|  | |
| 26 | 
             
                                                              num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose)
         | 
| 27 | 
             
                    alphas_cumprod = self.model.alphas_cumprod
         | 
| 28 | 
             
                    assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep'
         | 
| 29 | 
            +
                    to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.model.device)
         | 
|  | |
| 30 |  | 
| 31 | 
             
                    self.register_buffer('betas', to_torch(self.model.betas))
         | 
| 32 | 
             
                    self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))
         | 
|  | |
| 71 | 
             
                           corrector_kwargs=None,
         | 
| 72 | 
             
                           verbose=True,
         | 
| 73 | 
             
                           x_T=None,
         | 
| 74 | 
            +
                           log_every_t=100,
         | 
| 75 | 
            +
                           **kwargs
         | 
| 76 | 
             
                           ):
         | 
| 77 | 
             
                    if conditioning is not None:
         | 
| 78 | 
             
                        if isinstance(conditioning, dict):
         | 
    	
        ldm/models/diffusion/ddpm.py
    CHANGED
    
    | @@ -16,14 +16,14 @@ from contextlib import contextmanager | |
| 16 | 
             
            from functools import partial
         | 
| 17 | 
             
            from tqdm import tqdm
         | 
| 18 | 
             
            from torchvision.utils import make_grid
         | 
| 19 | 
            -
            from PIL import Image
         | 
| 20 | 
             
            from pytorch_lightning.utilities.distributed import rank_zero_only
         | 
| 21 |  | 
| 22 | 
             
            from ldm.util import log_txt_as_img, exists, default, ismap, isimage, mean_flat, count_params, instantiate_from_config
         | 
| 23 | 
             
            from ldm.modules.ema import LitEma
         | 
| 24 | 
             
            from ldm.modules.distributions.distributions import normal_kl, DiagonalGaussianDistribution
         | 
| 25 | 
             
            from ldm.models.autoencoder import VQModelInterface, IdentityFirstStage, AutoencoderKL
         | 
| 26 | 
            -
            from ldm.modules.diffusionmodules.util import make_beta_schedule, extract_into_tensor
         | 
|  | |
| 27 |  | 
| 28 |  | 
| 29 | 
             
            __conditioning_keys__ = {'concat': 'c_concat',
         | 
| @@ -37,12 +37,6 @@ def disabled_train(self, mode=True): | |
| 37 | 
             
                return self
         | 
| 38 |  | 
| 39 |  | 
| 40 | 
            -
            def noise_like(shape, device, repeat=False):
         | 
| 41 | 
            -
                repeat_noise = lambda: torch.randn((1, *shape[1:]), device=device).repeat(shape[0], *((1,) * (len(shape) - 1)))
         | 
| 42 | 
            -
                noise = lambda: torch.randn(shape, device=device)
         | 
| 43 | 
            -
                return repeat_noise() if repeat else noise()
         | 
| 44 | 
            -
             | 
| 45 | 
            -
             | 
| 46 | 
             
            def uniform_on_device(r1, r2, shape, device):
         | 
| 47 | 
             
                return (r1 - r2) * torch.rand(*shape, device=device) + r2
         | 
| 48 |  | 
| @@ -119,6 +113,7 @@ class DDPM(pl.LightningModule): | |
| 119 | 
             
                    if self.learn_logvar:
         | 
| 120 | 
             
                        self.logvar = nn.Parameter(self.logvar, requires_grad=True)
         | 
| 121 |  | 
|  | |
| 122 | 
             
                def register_schedule(self, given_betas=None, beta_schedule="linear", timesteps=1000,
         | 
| 123 | 
             
                                      linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):
         | 
| 124 | 
             
                    if exists(given_betas):
         | 
| @@ -1188,7 +1183,6 @@ class LatentDiffusion(DDPM): | |
| 1188 |  | 
| 1189 | 
             
                    if start_T is not None:
         | 
| 1190 | 
             
                        timesteps = min(timesteps, start_T)
         | 
| 1191 | 
            -
                    print(timesteps, start_T)
         | 
| 1192 | 
             
                    iterator = tqdm(reversed(range(0, timesteps)), desc='Sampling t', total=timesteps) if verbose else reversed(
         | 
| 1193 | 
             
                        range(0, timesteps))
         | 
| 1194 |  | 
| @@ -1222,7 +1216,7 @@ class LatentDiffusion(DDPM): | |
| 1222 | 
             
                @torch.no_grad()
         | 
| 1223 | 
             
                def sample(self, cond, batch_size=16, return_intermediates=False, x_T=None,
         | 
| 1224 | 
             
                           verbose=True, timesteps=None, quantize_denoised=False,
         | 
| 1225 | 
            -
                           mask=None, x0=None, shape=None):
         | 
| 1226 | 
             
                    if shape is None:
         | 
| 1227 | 
             
                        shape = (batch_size, self.channels, self.image_size, self.image_size)
         | 
| 1228 | 
             
                    if cond is not None:
         | 
| @@ -1238,10 +1232,28 @@ class LatentDiffusion(DDPM): | |
| 1238 | 
             
                                              mask=mask, x0=x0)
         | 
| 1239 |  | 
| 1240 | 
             
                @torch.no_grad()
         | 
| 1241 | 
            -
                def  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 1242 | 
             
                               quantize_denoised=True, inpaint=True, plot_denoise_rows=False, plot_progressive_rows=True,
         | 
| 1243 | 
             
                               plot_diffusion_rows=True, **kwargs):
         | 
| 1244 | 
            -
             | 
|  | |
|  | |
| 1245 | 
             
                    log = dict()
         | 
| 1246 | 
             
                    z, c, x, xrec, xc = self.get_input(batch, self.first_stage_key,
         | 
| 1247 | 
             
                                                       return_first_stage_outputs=True,
         | 
| @@ -1288,7 +1300,9 @@ class LatentDiffusion(DDPM): | |
| 1288 | 
             
                    if sample:
         | 
| 1289 | 
             
                        # get denoise row
         | 
| 1290 | 
             
                        with self.ema_scope("Plotting"):
         | 
| 1291 | 
            -
                            samples, z_denoise_row = self. | 
|  | |
|  | |
| 1292 | 
             
                        x_samples = self.decode_first_stage(samples)
         | 
| 1293 | 
             
                        log["samples"] = x_samples
         | 
| 1294 | 
             
                        if plot_denoise_rows:
         | 
| @@ -1299,8 +1313,11 @@ class LatentDiffusion(DDPM): | |
| 1299 | 
             
                                self.first_stage_model, IdentityFirstStage):
         | 
| 1300 | 
             
                            # also display when quantizing x0 while sampling
         | 
| 1301 | 
             
                            with self.ema_scope("Plotting Quantized Denoised"):
         | 
| 1302 | 
            -
                                samples, z_denoise_row = self. | 
| 1303 | 
            -
             | 
|  | |
|  | |
|  | |
| 1304 | 
             
                            x_samples = self.decode_first_stage(samples.to(self.device))
         | 
| 1305 | 
             
                            log["samples_x0_quantized"] = x_samples
         | 
| 1306 |  | 
| @@ -1312,19 +1329,17 @@ class LatentDiffusion(DDPM): | |
| 1312 | 
             
                            mask[:, h // 4:3 * h // 4, w // 4:3 * w // 4] = 0.
         | 
| 1313 | 
             
                            mask = mask[:, None, ...]
         | 
| 1314 | 
             
                            with self.ema_scope("Plotting Inpaint"):
         | 
| 1315 | 
            -
             | 
| 1316 | 
            -
             | 
|  | |
| 1317 | 
             
                            x_samples = self.decode_first_stage(samples.to(self.device))
         | 
| 1318 | 
             
                            log["samples_inpainting"] = x_samples
         | 
| 1319 | 
             
                            log["mask"] = mask
         | 
| 1320 | 
            -
                            if plot_denoise_rows:
         | 
| 1321 | 
            -
                                denoise_grid = self._get_denoise_row_from_list(z_denoise_row)
         | 
| 1322 | 
            -
                                log["denoise_row_inpainting"] = denoise_grid
         | 
| 1323 |  | 
| 1324 | 
             
                            # outpaint
         | 
| 1325 | 
             
                            with self.ema_scope("Plotting Outpaint"):
         | 
| 1326 | 
            -
                                samples = self. | 
| 1327 | 
            -
             | 
| 1328 | 
             
                            x_samples = self.decode_first_stage(samples.to(self.device))
         | 
| 1329 | 
             
                            log["samples_outpainting"] = x_samples
         | 
| 1330 |  | 
|  | |
| 16 | 
             
            from functools import partial
         | 
| 17 | 
             
            from tqdm import tqdm
         | 
| 18 | 
             
            from torchvision.utils import make_grid
         | 
|  | |
| 19 | 
             
            from pytorch_lightning.utilities.distributed import rank_zero_only
         | 
| 20 |  | 
| 21 | 
             
            from ldm.util import log_txt_as_img, exists, default, ismap, isimage, mean_flat, count_params, instantiate_from_config
         | 
| 22 | 
             
            from ldm.modules.ema import LitEma
         | 
| 23 | 
             
            from ldm.modules.distributions.distributions import normal_kl, DiagonalGaussianDistribution
         | 
| 24 | 
             
            from ldm.models.autoencoder import VQModelInterface, IdentityFirstStage, AutoencoderKL
         | 
| 25 | 
            +
            from ldm.modules.diffusionmodules.util import make_beta_schedule, extract_into_tensor, noise_like
         | 
| 26 | 
            +
            from ldm.models.diffusion.ddim import DDIMSampler
         | 
| 27 |  | 
| 28 |  | 
| 29 | 
             
            __conditioning_keys__ = {'concat': 'c_concat',
         | 
|  | |
| 37 | 
             
                return self
         | 
| 38 |  | 
| 39 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 40 | 
             
            def uniform_on_device(r1, r2, shape, device):
         | 
| 41 | 
             
                return (r1 - r2) * torch.rand(*shape, device=device) + r2
         | 
| 42 |  | 
|  | |
| 113 | 
             
                    if self.learn_logvar:
         | 
| 114 | 
             
                        self.logvar = nn.Parameter(self.logvar, requires_grad=True)
         | 
| 115 |  | 
| 116 | 
            +
             | 
| 117 | 
             
                def register_schedule(self, given_betas=None, beta_schedule="linear", timesteps=1000,
         | 
| 118 | 
             
                                      linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):
         | 
| 119 | 
             
                    if exists(given_betas):
         | 
|  | |
| 1183 |  | 
| 1184 | 
             
                    if start_T is not None:
         | 
| 1185 | 
             
                        timesteps = min(timesteps, start_T)
         | 
|  | |
| 1186 | 
             
                    iterator = tqdm(reversed(range(0, timesteps)), desc='Sampling t', total=timesteps) if verbose else reversed(
         | 
| 1187 | 
             
                        range(0, timesteps))
         | 
| 1188 |  | 
|  | |
| 1216 | 
             
                @torch.no_grad()
         | 
| 1217 | 
             
                def sample(self, cond, batch_size=16, return_intermediates=False, x_T=None,
         | 
| 1218 | 
             
                           verbose=True, timesteps=None, quantize_denoised=False,
         | 
| 1219 | 
            +
                           mask=None, x0=None, shape=None,**kwargs):
         | 
| 1220 | 
             
                    if shape is None:
         | 
| 1221 | 
             
                        shape = (batch_size, self.channels, self.image_size, self.image_size)
         | 
| 1222 | 
             
                    if cond is not None:
         | 
|  | |
| 1232 | 
             
                                              mask=mask, x0=x0)
         | 
| 1233 |  | 
| 1234 | 
             
                @torch.no_grad()
         | 
| 1235 | 
            +
                def sample_log(self,cond,batch_size,ddim, ddim_steps,**kwargs):
         | 
| 1236 | 
            +
             | 
| 1237 | 
            +
                    if ddim:
         | 
| 1238 | 
            +
                        ddim_sampler = DDIMSampler(self)
         | 
| 1239 | 
            +
                        shape = (self.channels, self.image_size, self.image_size)
         | 
| 1240 | 
            +
                        samples, intermediates =ddim_sampler.sample(ddim_steps,batch_size,
         | 
| 1241 | 
            +
                                                                    shape,cond,verbose=False,**kwargs)
         | 
| 1242 | 
            +
             | 
| 1243 | 
            +
                    else:
         | 
| 1244 | 
            +
                        samples, intermediates = self.sample(cond=cond, batch_size=batch_size,
         | 
| 1245 | 
            +
                                                             return_intermediates=True,**kwargs)
         | 
| 1246 | 
            +
             | 
| 1247 | 
            +
                    return samples, intermediates
         | 
| 1248 | 
            +
             | 
| 1249 | 
            +
             | 
| 1250 | 
            +
                @torch.no_grad()
         | 
| 1251 | 
            +
                def log_images(self, batch, N=8, n_row=4, sample=True, ddim_steps=200, ddim_eta=1., return_keys=None,
         | 
| 1252 | 
             
                               quantize_denoised=True, inpaint=True, plot_denoise_rows=False, plot_progressive_rows=True,
         | 
| 1253 | 
             
                               plot_diffusion_rows=True, **kwargs):
         | 
| 1254 | 
            +
             | 
| 1255 | 
            +
                    use_ddim = ddim_steps is not None
         | 
| 1256 | 
            +
             | 
| 1257 | 
             
                    log = dict()
         | 
| 1258 | 
             
                    z, c, x, xrec, xc = self.get_input(batch, self.first_stage_key,
         | 
| 1259 | 
             
                                                       return_first_stage_outputs=True,
         | 
|  | |
| 1300 | 
             
                    if sample:
         | 
| 1301 | 
             
                        # get denoise row
         | 
| 1302 | 
             
                        with self.ema_scope("Plotting"):
         | 
| 1303 | 
            +
                            samples, z_denoise_row = self.sample_log(cond=c,batch_size=N,ddim=use_ddim,
         | 
| 1304 | 
            +
                                                                     ddim_steps=ddim_steps,eta=ddim_eta)
         | 
| 1305 | 
            +
                            # samples, z_denoise_row = self.sample(cond=c, batch_size=N, return_intermediates=True)
         | 
| 1306 | 
             
                        x_samples = self.decode_first_stage(samples)
         | 
| 1307 | 
             
                        log["samples"] = x_samples
         | 
| 1308 | 
             
                        if plot_denoise_rows:
         | 
|  | |
| 1313 | 
             
                                self.first_stage_model, IdentityFirstStage):
         | 
| 1314 | 
             
                            # also display when quantizing x0 while sampling
         | 
| 1315 | 
             
                            with self.ema_scope("Plotting Quantized Denoised"):
         | 
| 1316 | 
            +
                                samples, z_denoise_row = self.sample_log(cond=c,batch_size=N,ddim=use_ddim,
         | 
| 1317 | 
            +
                                                                         ddim_steps=ddim_steps,eta=ddim_eta,
         | 
| 1318 | 
            +
                                                                         quantize_denoised=True)
         | 
| 1319 | 
            +
                                # samples, z_denoise_row = self.sample(cond=c, batch_size=N, return_intermediates=True,
         | 
| 1320 | 
            +
                                #                                      quantize_denoised=True)
         | 
| 1321 | 
             
                            x_samples = self.decode_first_stage(samples.to(self.device))
         | 
| 1322 | 
             
                            log["samples_x0_quantized"] = x_samples
         | 
| 1323 |  | 
|  | |
| 1329 | 
             
                            mask[:, h // 4:3 * h // 4, w // 4:3 * w // 4] = 0.
         | 
| 1330 | 
             
                            mask = mask[:, None, ...]
         | 
| 1331 | 
             
                            with self.ema_scope("Plotting Inpaint"):
         | 
| 1332 | 
            +
             | 
| 1333 | 
            +
                                samples, _ = self.sample_log(cond=c,batch_size=N,ddim=use_ddim, eta=ddim_eta,
         | 
| 1334 | 
            +
                                                            ddim_steps=ddim_steps, x0=z[:N], mask=mask)
         | 
| 1335 | 
             
                            x_samples = self.decode_first_stage(samples.to(self.device))
         | 
| 1336 | 
             
                            log["samples_inpainting"] = x_samples
         | 
| 1337 | 
             
                            log["mask"] = mask
         | 
|  | |
|  | |
|  | |
| 1338 |  | 
| 1339 | 
             
                            # outpaint
         | 
| 1340 | 
             
                            with self.ema_scope("Plotting Outpaint"):
         | 
| 1341 | 
            +
                                samples, _ = self.sample_log(cond=c, batch_size=N, ddim=use_ddim,eta=ddim_eta,
         | 
| 1342 | 
            +
                                                            ddim_steps=ddim_steps, x0=z[:N], mask=mask)
         | 
| 1343 | 
             
                            x_samples = self.decode_first_stage(samples.to(self.device))
         | 
| 1344 | 
             
                            log["samples_outpainting"] = x_samples
         | 
| 1345 |  | 
    	
        ldm/modules/diffusionmodules/util.py
    CHANGED
    
    | @@ -259,3 +259,9 @@ class HybridConditioner(nn.Module): | |
| 259 | 
             
                    c_concat = self.concat_conditioner(c_concat)
         | 
| 260 | 
             
                    c_crossattn = self.crossattn_conditioner(c_crossattn)
         | 
| 261 | 
             
                    return {'c_concat': [c_concat], 'c_crossattn': [c_crossattn]}
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 259 | 
             
                    c_concat = self.concat_conditioner(c_concat)
         | 
| 260 | 
             
                    c_crossattn = self.crossattn_conditioner(c_crossattn)
         | 
| 261 | 
             
                    return {'c_concat': [c_concat], 'c_crossattn': [c_crossattn]}
         | 
| 262 | 
            +
             | 
| 263 | 
            +
             | 
| 264 | 
            +
            def noise_like(shape, device, repeat=False):
         | 
| 265 | 
            +
                repeat_noise = lambda: torch.randn((1, *shape[1:]), device=device).repeat(shape[0], *((1,) * (len(shape) - 1)))
         | 
| 266 | 
            +
                noise = lambda: torch.randn(shape, device=device)
         | 
| 267 | 
            +
                return repeat_noise() if repeat else noise()
         | 
    	
        main.py
    CHANGED
    
    | @@ -676,7 +676,10 @@ if __name__ == "__main__": | |
| 676 | 
             
                        ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
         | 
| 677 | 
             
                    else:
         | 
| 678 | 
             
                        ngpu = 1
         | 
| 679 | 
            -
                    accumulate_grad_batches  | 
|  | |
|  | |
|  | |
| 680 | 
             
                    print(f"accumulate_grad_batches = {accumulate_grad_batches}")
         | 
| 681 | 
             
                    lightning_config.trainer.accumulate_grad_batches = accumulate_grad_batches
         | 
| 682 | 
             
                    if opt.scale_lr:
         | 
|  | |
| 676 | 
             
                        ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
         | 
| 677 | 
             
                    else:
         | 
| 678 | 
             
                        ngpu = 1
         | 
| 679 | 
            +
                    if 'accumulate_grad_batches' in lightning_config.trainer:
         | 
| 680 | 
            +
                        accumulate_grad_batches = lightning_config.trainer.accumulate_grad_batches
         | 
| 681 | 
            +
                    else:
         | 
| 682 | 
            +
                        accumulate_grad_batches = 1
         | 
| 683 | 
             
                    print(f"accumulate_grad_batches = {accumulate_grad_batches}")
         | 
| 684 | 
             
                    lightning_config.trainer.accumulate_grad_batches = accumulate_grad_batches
         | 
| 685 | 
             
                    if opt.scale_lr:
         | 
    	
        models/ldm/semantic_synthesis256/config.yaml
    ADDED
    
    | @@ -0,0 +1,59 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            model:
         | 
| 2 | 
            +
              base_learning_rate: 1.0e-06
         | 
| 3 | 
            +
              target: ldm.models.diffusion.ddpm.LatentDiffusion
         | 
| 4 | 
            +
              params:
         | 
| 5 | 
            +
                linear_start: 0.0015
         | 
| 6 | 
            +
                linear_end: 0.0205
         | 
| 7 | 
            +
                log_every_t: 100
         | 
| 8 | 
            +
                timesteps: 1000
         | 
| 9 | 
            +
                loss_type: l1
         | 
| 10 | 
            +
                first_stage_key: image
         | 
| 11 | 
            +
                cond_stage_key: segmentation
         | 
| 12 | 
            +
                image_size: 64
         | 
| 13 | 
            +
                channels: 3
         | 
| 14 | 
            +
                concat_mode: true
         | 
| 15 | 
            +
                cond_stage_trainable: true
         | 
| 16 | 
            +
                unet_config:
         | 
| 17 | 
            +
                  target: ldm.modules.diffusionmodules.openaimodel.UNetModel
         | 
| 18 | 
            +
                  params:
         | 
| 19 | 
            +
                    image_size: 64
         | 
| 20 | 
            +
                    in_channels: 6
         | 
| 21 | 
            +
                    out_channels: 3
         | 
| 22 | 
            +
                    model_channels: 128
         | 
| 23 | 
            +
                    attention_resolutions:
         | 
| 24 | 
            +
                    - 32
         | 
| 25 | 
            +
                    - 16
         | 
| 26 | 
            +
                    - 8
         | 
| 27 | 
            +
                    num_res_blocks: 2
         | 
| 28 | 
            +
                    channel_mult:
         | 
| 29 | 
            +
                    - 1
         | 
| 30 | 
            +
                    - 4
         | 
| 31 | 
            +
                    - 8
         | 
| 32 | 
            +
                    num_heads: 8
         | 
| 33 | 
            +
                first_stage_config:
         | 
| 34 | 
            +
                  target: ldm.models.autoencoder.VQModelInterface
         | 
| 35 | 
            +
                  params:
         | 
| 36 | 
            +
                    embed_dim: 3
         | 
| 37 | 
            +
                    n_embed: 8192
         | 
| 38 | 
            +
                    ddconfig:
         | 
| 39 | 
            +
                      double_z: false
         | 
| 40 | 
            +
                      z_channels: 3
         | 
| 41 | 
            +
                      resolution: 256
         | 
| 42 | 
            +
                      in_channels: 3
         | 
| 43 | 
            +
                      out_ch: 3
         | 
| 44 | 
            +
                      ch: 128
         | 
| 45 | 
            +
                      ch_mult:
         | 
| 46 | 
            +
                      - 1
         | 
| 47 | 
            +
                      - 2
         | 
| 48 | 
            +
                      - 4
         | 
| 49 | 
            +
                      num_res_blocks: 2
         | 
| 50 | 
            +
                      attn_resolutions: []
         | 
| 51 | 
            +
                      dropout: 0.0
         | 
| 52 | 
            +
                    lossconfig:
         | 
| 53 | 
            +
                      target: torch.nn.Identity
         | 
| 54 | 
            +
                cond_stage_config:
         | 
| 55 | 
            +
                  target: ldm.modules.encoders.modules.SpatialRescaler
         | 
| 56 | 
            +
                  params:
         | 
| 57 | 
            +
                    n_stages: 2
         | 
| 58 | 
            +
                    in_channels: 182
         | 
| 59 | 
            +
                    out_channels: 3
         | 
    	
        scripts/download_first_stages.sh
    CHANGED
    
    | @@ -4,10 +4,10 @@ wget -O models/first_stage_models/kl-f8/model.zip https://ommer-lab.com/files/la | |
| 4 | 
             
            wget -O models/first_stage_models/kl-f16/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f16.zip
         | 
| 5 | 
             
            wget -O models/first_stage_models/kl-f32/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f32.zip
         | 
| 6 | 
             
            wget -O models/first_stage_models/vq-f4/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f4.zip
         | 
| 7 | 
            -
            wget -O models/first_stage_models/vq-f4-noattn/model.zip https:// | 
| 8 | 
             
            wget -O models/first_stage_models/vq-f8/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f8.zip
         | 
| 9 | 
             
            wget -O models/first_stage_models/vq-f8-n256/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip
         | 
| 10 | 
            -
            wget -O models/first_stage_models/vq-f16/model.zip https:// | 
| 11 |  | 
| 12 |  | 
| 13 |  | 
|  | |
| 4 | 
             
            wget -O models/first_stage_models/kl-f16/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f16.zip
         | 
| 5 | 
             
            wget -O models/first_stage_models/kl-f32/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f32.zip
         | 
| 6 | 
             
            wget -O models/first_stage_models/vq-f4/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f4.zip
         | 
| 7 | 
            +
            wget -O models/first_stage_models/vq-f4-noattn/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f4-noattn.zip
         | 
| 8 | 
             
            wget -O models/first_stage_models/vq-f8/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f8.zip
         | 
| 9 | 
             
            wget -O models/first_stage_models/vq-f8-n256/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip
         | 
| 10 | 
            +
            wget -O models/first_stage_models/vq-f16/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f16.zip
         | 
| 11 |  | 
| 12 |  | 
| 13 |  | 
    	
        scripts/download_models.sh
    CHANGED
    
    | @@ -6,9 +6,10 @@ wget -O models/ldm/lsun_beds256/lsun_beds-256.zip https://ommer-lab.com/files/la | |
| 6 | 
             
            wget -O models/ldm/text2img256/model.zip https://ommer-lab.com/files/latent-diffusion/text2img.zip
         | 
| 7 | 
             
            wget -O models/ldm/cin256/model.zip https://ommer-lab.com/files/latent-diffusion/cin.zip
         | 
| 8 | 
             
            wget -O models/ldm/semantic_synthesis512/model.zip https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip
         | 
|  | |
| 9 | 
             
            wget -O models/ldm/bsr_sr/model.zip https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip
         | 
| 10 | 
             
            wget -O models/ldm/layout2img-openimages256/model.zip https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip
         | 
| 11 | 
            -
            wget -O models/ldm/inpainting_big/ | 
| 12 |  | 
| 13 |  | 
| 14 |  | 
| @@ -33,10 +34,16 @@ unzip -o model.zip | |
| 33 | 
             
            cd ../semantic_synthesis512
         | 
| 34 | 
             
            unzip -o model.zip
         | 
| 35 |  | 
|  | |
|  | |
|  | |
| 36 | 
             
            cd ../bsr_sr
         | 
| 37 | 
             
            unzip -o model.zip
         | 
| 38 |  | 
| 39 | 
             
            cd ../layout2img-openimages256
         | 
| 40 | 
             
            unzip -o model.zip
         | 
| 41 |  | 
|  | |
|  | |
|  | |
| 42 | 
             
            cd ../..
         | 
|  | |
| 6 | 
             
            wget -O models/ldm/text2img256/model.zip https://ommer-lab.com/files/latent-diffusion/text2img.zip
         | 
| 7 | 
             
            wget -O models/ldm/cin256/model.zip https://ommer-lab.com/files/latent-diffusion/cin.zip
         | 
| 8 | 
             
            wget -O models/ldm/semantic_synthesis512/model.zip https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip
         | 
| 9 | 
            +
            wget -O models/ldm/semantic_synthesis256/model.zip https://ommer-lab.com/files/latent-diffusion/semantic_synthesis256.zip
         | 
| 10 | 
             
            wget -O models/ldm/bsr_sr/model.zip https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip
         | 
| 11 | 
             
            wget -O models/ldm/layout2img-openimages256/model.zip https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip
         | 
| 12 | 
            +
            wget -O models/ldm/inpainting_big/model.zip https://ommer-lab.com/files/latent-diffusion/inpainting_big.zip
         | 
| 13 |  | 
| 14 |  | 
| 15 |  | 
|  | |
| 34 | 
             
            cd ../semantic_synthesis512
         | 
| 35 | 
             
            unzip -o model.zip
         | 
| 36 |  | 
| 37 | 
            +
            cd ../semantic_synthesis256
         | 
| 38 | 
            +
            unzip -o model.zip
         | 
| 39 | 
            +
             | 
| 40 | 
             
            cd ../bsr_sr
         | 
| 41 | 
             
            unzip -o model.zip
         | 
| 42 |  | 
| 43 | 
             
            cd ../layout2img-openimages256
         | 
| 44 | 
             
            unzip -o model.zip
         | 
| 45 |  | 
| 46 | 
            +
            cd ../inpainting_big
         | 
| 47 | 
            +
            unzip -o model.zip
         | 
| 48 | 
            +
             | 
| 49 | 
             
            cd ../..
         | 
