Update README.md
Browse files
README.md
CHANGED
@@ -6,4 +6,32 @@ datasets:
|
|
6 |
- bitmind/AFHQ
|
7 |
- ILSVRC/imagenet-1k
|
8 |
pipeline_tag: unconditional-image-generation
|
9 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
- bitmind/AFHQ
|
7 |
- ILSVRC/imagenet-1k
|
8 |
pipeline_tag: unconditional-image-generation
|
9 |
+
---
|
10 |
+
---
|
11 |
+
license: apache-2.0
|
12 |
+
language:
|
13 |
+
- en
|
14 |
+
datasets:
|
15 |
+
- bitmind/AFHQ
|
16 |
+
- ILSVRC/imagenet-1k
|
17 |
+
pipeline_tag: unconditional-image-generation
|
18 |
+
---
|
19 |
+
|
20 |
+
# Transformer AutoRegressive Flow Model
|
21 |
+
The TarFlow is proposed by [Zhai et al., 2024], which introduces stacks of autoregressive Transformer blocks (similar to MAF) into the building of affine coupling layers to do Non-Volume Preserving, combined with guidance and denoising }, finally achieves state-of-the-art results across multiple benchmarks.
|
22 |
+
|
23 |
+
Let $z$ denotes the noise direction and $x$ denotes the image direction, both with size $(B,T,C)$, where B,T,C represent batch size, patchified sequence length, and feature dimension, respectively. For TarFlow model, an autoregressive block can be written as:
|
24 |
+
|
25 |
+
\begin{equation}
|
26 |
+
\begin{aligned}
|
27 |
+
\text{Forward:\quad}z_t &= \exp(-s(x_{<t}))(x_t-u(x_{<t})),\\
|
28 |
+
\text{Inverse:\quad}x_t &= \exp(s(x_{<t})) z_t +u(x_{<t}).
|
29 |
+
\end{aligned}
|
30 |
+
\end{equation}
|
31 |
+
|
32 |
+
|
33 |
+
It's sampling process is extremely slow, and we want to accelerate it in []. In experiments, we found that the
|
34 |
+
|
35 |
+
|
36 |
+
|
37 |
+
[1] Zhai S, Zhang R, Nakkiran P, et al. Normalizing flows are capable generative models[J]. arXiv preprint arXiv:2412.06329, 2024.
|