Unconditional Image Generation
English
encoreus commited on
Commit
79c81c9
·
verified ·
1 Parent(s): 62a7daa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -10
README.md CHANGED
@@ -17,21 +17,28 @@ datasets:
17
  pipeline_tag: unconditional-image-generation
18
  ---
19
 
20
- # Transformer AutoRegressive Flow Model
 
21
  The TarFlow is proposed by [Zhai et al., 2024], which introduces stacks of autoregressive Transformer blocks (similar to MAF) into the building of affine coupling layers to do Non-Volume Preserving, combined with guidance and denoising }, finally achieves state-of-the-art results across multiple benchmarks.
22
 
23
- Let $z$ denotes the noise direction and $x$ denotes the image direction, both with size $(B,T,C)$, where B,T,C represent batch size, patchified sequence length, and feature dimension, respectively. For TarFlow model, an autoregressive block can be written as:
 
 
 
 
 
 
 
 
 
24
 
25
- \begin{equation}
26
- \begin{aligned}
27
- \text{Forward:\quad}z_t &= \exp(-s(x_{<t}))(x_t-u(x_{<t})),\\
28
- \text{Inverse:\quad}x_t &= \exp(s(x_{<t})) z_t +u(x_{<t}).
29
- \end{aligned}
30
- \end{equation}
31
 
 
32
 
33
- It's sampling process is extremely slow, and we want to accelerate it in []. In experiments, we found that the
34
 
35
 
36
 
37
- [1] Zhai S, Zhang R, Nakkiran P, et al. Normalizing flows are capable generative models[J]. arXiv preprint arXiv:2412.06329, 2024.
 
 
17
  pipeline_tag: unconditional-image-generation
18
  ---
19
 
20
+ **Transformer AutoRegressive Flow Model**
21
+
22
  The TarFlow is proposed by [Zhai et al., 2024], which introduces stacks of autoregressive Transformer blocks (similar to MAF) into the building of affine coupling layers to do Non-Volume Preserving, combined with guidance and denoising }, finally achieves state-of-the-art results across multiple benchmarks.
23
 
24
+ It's sampling process is extremely slow, and we want to accelerate it in []. In experiments, we find that the model parameters are not available in [Zhai et al., 2024], so we retrain TarFlow models and upload them.
25
+
26
+ As metioned in [Zhai et al., 2024], a TarFlow model can be denoted as P-Ch-T-K-pε, with
27
+ patch size (P), model channel size (Ch), number of autoregressive flow blocks (T), the number of attention layers in each flow (K), the best input noise variance pε that yields the best sampling quality for generation tasks.
28
+
29
+ We trained five models:
30
+
31
+ - AFHQ (256x256) conditional: afhq_model_8_768_8_8_0.07.pth
32
+
33
+ - ImageNet (128x128) conditional: imagenet_model_4_1024_8_8_0.15.pth
34
 
35
+ - ImageNet (64x64) unconditional: imagenet64_model_2_768_8_8_0.05.pth
 
 
 
 
 
36
 
37
+ - ImageNet (64x64) conditional: imagenet_model_2_768_8_8_0.05.pth
38
 
39
+ - ImageNet (64x64) conditioanl: imagenet_model_4_1024_8_8_0.05.pth
40
 
41
 
42
 
43
+ [1] Zhai S, Zhang R, Nakkiran P, et al. Normalizing flows are capable generative models[J]. arXiv preprint arXiv:2412.06329, 2024.
44
+ []