File size: 177,179 Bytes
e1e2753
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
run_dir: .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16can not find a checkpoint, will train from scratch

Train Epoch #1:   0%|          | 0/20799 [00:00<?, ?it/s]
Train Epoch #1:   0%|          | 38/20799 [00:10<1:32:04,  3.76it/s, shape=torch.Size([32, 32, 16, 16]), global_step=38, grad_norm=0.245, lr=0.0002, loss=0.468]
Train Epoch #1:   0%|          | 84/20799 [00:20<1:21:50,  4.22it/s, shape=torch.Size([32, 32, 16, 16]), global_step=84, grad_norm=0.293, lr=0.0002, loss=0.35] 
Train Epoch #1:   1%|          | 130/20799 [00:30<1:18:55,  4.36it/s, shape=torch.Size([32, 32, 16, 16]), global_step=130, grad_norm=0.201, lr=0.0002, loss=0.309]
Train Epoch #1:   1%|          | 176/20799 [00:40<1:17:30,  4.43it/s, shape=torch.Size([32, 32, 16, 16]), global_step=176, grad_norm=0.209, lr=0.0002, loss=0.289]
Train Epoch #1:   1%|          | 222/20799 [00:50<1:16:40,  4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=222, grad_norm=0.157, lr=0.0002, loss=0.277]
Train Epoch #1:   1%|▏         | 268/20799 [01:00<1:16:06,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=268, grad_norm=0.17, lr=0.0002, loss=0.268] 
Train Epoch #1:   2%|▏         | 314/20799 [01:10<1:15:42,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=314, grad_norm=0.18, lr=0.0002, loss=0.26] 
Train Epoch #1:   2%|▏         | 360/20799 [01:21<1:15:22,  4.52it/s, shape=torch.Size([32, 32, 16, 16]), global_step=360, grad_norm=0.187, lr=0.0002, loss=0.255]
Train Epoch #1:   2%|▏         | 406/20799 [01:31<1:15:07,  4.52it/s, shape=torch.Size([32, 32, 16, 16]), global_step=406, grad_norm=0.13, lr=0.0002, loss=0.251] 
Train Epoch #1:   2%|▏         | 452/20799 [01:41<1:14:55,  4.53it/s, shape=torch.Size([32, 32, 16, 16]), global_step=452, grad_norm=0.187, lr=0.0002, loss=0.248]
Train Epoch #1:   2%|▏         | 462/20799 [01:44<1:16:20,  4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=462, grad_norm=0.162, lr=0.0002, loss=0.248]run_dir: .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16can not find a checkpoint, will train from scratch

Train Epoch #1:   0%|          | 0/41598 [00:00<?, ?it/s]
Train Epoch #1:   0%|          | 38/41598 [00:10<3:04:37,  3.75it/s, shape=torch.Size([32, 32, 16, 16]), global_step=38, grad_norm=0.338, lr=0.0002, loss=0.469]
Train Epoch #1:   0%|          | 84/41598 [00:20<2:44:39,  4.20it/s, shape=torch.Size([32, 32, 16, 16]), global_step=84, grad_norm=0.398, lr=0.0002, loss=0.357]
Train Epoch #1:   0%|          | 130/41598 [00:30<2:39:06,  4.34it/s, shape=torch.Size([32, 32, 16, 16]), global_step=130, grad_norm=0.252, lr=0.0002, loss=0.318]
Train Epoch #1:   0%|          | 176/41598 [00:40<2:36:28,  4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=176, grad_norm=0.196, lr=0.0002, loss=0.297]
Train Epoch #1:   1%|          | 222/41598 [00:50<2:35:05,  4.45it/s, shape=torch.Size([32, 32, 16, 16]), global_step=222, grad_norm=0.197, lr=0.0002, loss=0.285]
Train Epoch #1:   1%|          | 268/41598 [01:01<2:34:09,  4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=268, grad_norm=0.206, lr=0.0002, loss=0.274]
Train Epoch #1:   1%|          | 314/41598 [01:11<2:33:28,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=314, grad_norm=0.183, lr=0.0002, loss=0.265]
Train Epoch #1:   1%|          | 360/41598 [01:21<2:32:58,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=360, grad_norm=0.188, lr=0.0002, loss=0.26] 
Train Epoch #1:   1%|          | 406/41598 [01:31<2:32:37,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=406, grad_norm=0.171, lr=0.0002, loss=0.256]
Train Epoch #1:   1%|          | 452/41598 [01:42<2:32:57,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=452, grad_norm=0.166, lr=0.0002, loss=0.252]
Train Epoch #1:   1%|          | 497/41598 [01:52<2:32:38,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=497, grad_norm=0.159, lr=0.0002, loss=0.25] 
Train Epoch #1:   1%|▏         | 543/41598 [02:02<2:32:12,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=543, grad_norm=0.195, lr=0.0002, loss=0.249]
Train Epoch #1:   1%|▏         | 589/41598 [02:12<2:31:50,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=589, grad_norm=0.145, lr=0.0002, loss=0.247]
Train Epoch #1:   2%|▏         | 635/41598 [02:22<2:31:35,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=635, grad_norm=0.175, lr=0.0002, loss=0.245]
Train Epoch #1:   2%|▏         | 681/41598 [02:32<2:31:18,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=681, grad_norm=0.166, lr=0.0002, loss=0.243]
Train Epoch #1:   2%|▏         | 727/41598 [02:42<2:31:02,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=727, grad_norm=0.159, lr=0.0002, loss=0.242]
Train Epoch #1:   2%|▏         | 773/41598 [02:53<2:30:48,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=773, grad_norm=0.132, lr=0.0002, loss=0.241]
Train Epoch #1:   2%|▏         | 819/41598 [03:03<2:31:17,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=819, grad_norm=0.169, lr=0.0002, loss=0.239]
Train Epoch #1:   2%|▏         | 865/41598 [03:13<2:30:53,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=865, grad_norm=0.132, lr=0.0002, loss=0.238]
Train Epoch #1:   2%|▏         | 911/41598 [03:23<2:30:33,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=911, grad_norm=0.145, lr=0.0002, loss=0.237]
Train Epoch #1:   2%|▏         | 957/41598 [03:34<2:30:13,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=957, grad_norm=0.144, lr=0.0002, loss=0.236]
Train Epoch #1:   2%|▏         | 1000/41598 [03:50<2:30:04,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1000, grad_norm=0.13, lr=0.0002, loss=0.235]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 1000

Train Epoch #1:   2%|▏         | 1001/41598 [03:56<3:24:48,  3.30it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1001, grad_norm=0.113, lr=0.0002, loss=0.235]
Train Epoch #1:   3%|▎         | 1047/41598 [04:06<3:07:59,  3.60it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1047, grad_norm=0.138, lr=0.0002, loss=0.234]
Train Epoch #1:   3%|▎         | 1093/41598 [04:16<2:56:10,  3.83it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1093, grad_norm=0.147, lr=0.0002, loss=0.233]
Train Epoch #1:   3%|▎         | 1139/41598 [04:26<2:47:55,  4.02it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1139, grad_norm=0.151, lr=0.0002, loss=0.233]
Train Epoch #1:   3%|▎         | 1185/41598 [04:36<2:42:11,  4.15it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1185, grad_norm=0.131, lr=0.0002, loss=0.231]
Train Epoch #1:   3%|▎         | 1231/41598 [04:47<2:38:42,  4.24it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1231, grad_norm=0.15, lr=0.0002, loss=0.231] 
Train Epoch #1:   3%|▎         | 1277/41598 [04:57<2:35:43,  4.32it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1277, grad_norm=0.117, lr=0.0002, loss=0.23]
Train Epoch #1:   3%|▎         | 1323/41598 [05:07<2:33:30,  4.37it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1323, grad_norm=0.112, lr=0.0002, loss=0.229]
Train Epoch #1:   3%|▎         | 1369/41598 [05:17<2:31:50,  4.42it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1369, grad_norm=0.122, lr=0.0002, loss=0.229]
Train Epoch #1:   3%|▎         | 1415/41598 [05:27<2:30:41,  4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1415, grad_norm=0.128, lr=0.0002, loss=0.228]
Train Epoch #1:   4%|▎         | 1461/41598 [05:38<2:29:49,  4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1461, grad_norm=0.154, lr=0.0002, loss=0.229]
Train Epoch #1:   4%|▎         | 1507/41598 [05:48<2:29:10,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1507, grad_norm=0.14, lr=0.0002, loss=0.228] 
Train Epoch #1:   4%|▎         | 1553/41598 [05:58<2:28:42,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1553, grad_norm=0.107, lr=0.0002, loss=0.227]
Train Epoch #1:   4%|▍         | 1599/41598 [06:08<2:28:47,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1599, grad_norm=0.126, lr=0.0002, loss=0.227]
Train Epoch #1:   4%|▍         | 1645/41598 [06:19<2:28:23,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1645, grad_norm=0.126, lr=0.0002, loss=0.226]
Train Epoch #1:   4%|▍         | 1691/41598 [06:29<2:28:00,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1691, grad_norm=0.108, lr=0.0002, loss=0.226]
Train Epoch #1:   4%|▍         | 1737/41598 [06:39<2:27:39,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1737, grad_norm=0.128, lr=0.0002, loss=0.226]
Train Epoch #1:   4%|▍         | 1783/41598 [06:49<2:27:21,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1783, grad_norm=0.118, lr=0.0002, loss=0.225]
Train Epoch #1:   4%|▍         | 1829/41598 [06:59<2:27:06,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1829, grad_norm=0.115, lr=0.0002, loss=0.225]
Train Epoch #1:   5%|▍         | 1875/41598 [07:10<2:26:54,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1875, grad_norm=0.126, lr=0.0002, loss=0.224]
Train Epoch #1:   5%|▍         | 1920/41598 [07:20<2:26:44,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1920, grad_norm=0.112, lr=0.0002, loss=0.224]
Train Epoch #1:   5%|▍         | 1921/41598 [07:20<2:26:39,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1921, grad_norm=0.113, lr=0.0002, loss=0.224]
Train Epoch #1:   5%|▍         | 1967/41598 [07:30<2:26:25,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1967, grad_norm=0.153, lr=0.0002, loss=0.223]
Train Epoch #1:   5%|▍         | 2000/41598 [07:50<2:26:18,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2000, grad_norm=0.128, lr=0.0002, loss=0.223]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 2000

Train Epoch #1:   5%|▍         | 2001/41598 [07:51<3:29:18,  3.15it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2001, grad_norm=0.114, lr=0.0002, loss=0.223]
Train Epoch #1:   5%|▍         | 2047/41598 [08:01<3:09:05,  3.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2047, grad_norm=0.106, lr=0.0002, loss=0.223]
Train Epoch #1:   5%|▌         | 2093/41598 [08:11<2:55:28,  3.75it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2093, grad_norm=0.114, lr=0.0002, loss=0.223]
Train Epoch #1:   5%|▌         | 2139/41598 [08:22<2:46:20,  3.95it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2139, grad_norm=0.14, lr=0.0002, loss=0.223] 
Train Epoch #1:   5%|▌         | 2185/41598 [08:32<2:39:53,  4.11it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2185, grad_norm=0.119, lr=0.0002, loss=0.222]
Train Epoch #1:   5%|▌         | 2231/41598 [08:42<2:35:22,  4.22it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2231, grad_norm=0.123, lr=0.0002, loss=0.222]
Train Epoch #1:   5%|▌         | 2277/41598 [08:52<2:32:11,  4.31it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2277, grad_norm=0.137, lr=0.0002, loss=0.221]
Train Epoch #1:   6%|▌         | 2323/41598 [09:02<2:29:52,  4.37it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2323, grad_norm=0.157, lr=0.0002, loss=0.221]
Train Epoch #1:   6%|▌         | 2369/41598 [09:13<2:28:14,  4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2369, grad_norm=0.107, lr=0.0002, loss=0.22] 
Train Epoch #1:   6%|▌         | 2415/41598 [09:23<2:27:02,  4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2415, grad_norm=0.113, lr=0.0002, loss=0.22]
Train Epoch #1:   6%|▌         | 2461/41598 [09:33<2:26:45,  4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2461, grad_norm=0.113, lr=0.0002, loss=0.22]
Train Epoch #1:   6%|▌         | 2507/41598 [09:43<2:26:00,  4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2507, grad_norm=0.133, lr=0.0002, loss=0.219]
Train Epoch #1:   6%|▌         | 2553/41598 [09:53<2:25:19,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2553, grad_norm=0.102, lr=0.0002, loss=0.219]
Train Epoch #1:   6%|▌         | 2599/41598 [10:04<2:24:53,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2599, grad_norm=0.107, lr=0.0002, loss=0.219]
Train Epoch #1:   6%|▋         | 2645/41598 [10:14<2:24:30,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2645, grad_norm=0.113, lr=0.0002, loss=0.219]
Train Epoch #1:   6%|▋         | 2691/41598 [10:24<2:24:09,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2691, grad_norm=0.109, lr=0.0002, loss=0.218]
Train Epoch #1:   7%|▋         | 2737/41598 [10:34<2:23:50,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2737, grad_norm=0.106, lr=0.0002, loss=0.218]
Train Epoch #1:   7%|▋         | 2783/41598 [10:44<2:23:34,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2783, grad_norm=0.116, lr=0.0002, loss=0.218]
Train Epoch #1:   7%|▋         | 2829/41598 [10:55<2:23:20,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2829, grad_norm=0.0959, lr=0.0002, loss=0.218]
Train Epoch #1:   7%|▋         | 2875/41598 [11:05<2:23:40,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2875, grad_norm=0.0972, lr=0.0002, loss=0.218]
Train Epoch #1:   7%|▋         | 2921/41598 [11:15<2:23:16,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2921, grad_norm=0.115, lr=0.0002, loss=0.217] 
Train Epoch #1:   7%|▋         | 2967/41598 [11:25<2:22:56,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2967, grad_norm=0.119, lr=0.0002, loss=0.217]
Train Epoch #1:   7%|▋         | 3000/41598 [11:40<2:22:49,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3000, grad_norm=0.105, lr=0.0002, loss=0.217]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 3000

Train Epoch #1:   7%|▋         | 3001/41598 [11:46<3:23:24,  3.16it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3001, grad_norm=0.105, lr=0.0002, loss=0.217]
Train Epoch #1:   7%|▋         | 3047/41598 [11:56<3:03:55,  3.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3047, grad_norm=0.104, lr=0.0002, loss=0.217]
Train Epoch #1:   7%|▋         | 3093/41598 [12:07<2:50:46,  3.76it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3093, grad_norm=0.105, lr=0.0002, loss=0.217]
Train Epoch #1:   8%|▊         | 3139/41598 [12:17<2:41:46,  3.96it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3139, grad_norm=0.0963, lr=0.0002, loss=0.217]
Train Epoch #1:   8%|▊         | 3185/41598 [12:27<2:35:32,  4.12it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3185, grad_norm=0.125, lr=0.0002, loss=0.217] 
Train Epoch #1:   8%|▊         | 3231/41598 [12:37<2:31:43,  4.21it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3231, grad_norm=0.088, lr=0.0002, loss=0.216]
Train Epoch #1:   8%|▊         | 3277/41598 [12:47<2:28:28,  4.30it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3277, grad_norm=0.0906, lr=0.0002, loss=0.216]
Train Epoch #1:   8%|▊         | 3323/41598 [12:58<2:26:15,  4.36it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3323, grad_norm=0.101, lr=0.0002, loss=0.216] 
Train Epoch #1:   8%|▊         | 3369/41598 [13:08<2:24:36,  4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3369, grad_norm=0.116, lr=0.0002, loss=0.216]
Train Epoch #1:   8%|▊         | 3415/41598 [13:18<2:23:26,  4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3415, grad_norm=0.106, lr=0.0002, loss=0.216]
Train Epoch #1:   8%|▊         | 3461/41598 [13:28<2:22:30,  4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3461, grad_norm=0.121, lr=0.0002, loss=0.216]
Train Epoch #1:   8%|▊         | 3507/41598 [13:38<2:21:52,  4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3507, grad_norm=0.109, lr=0.0002, loss=0.216]
Train Epoch #1:   9%|▊         | 3553/41598 [13:49<2:21:21,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3553, grad_norm=0.104, lr=0.0002, loss=0.216]
Train Epoch #1:   9%|▊         | 3599/41598 [13:59<2:21:19,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3599, grad_norm=0.129, lr=0.0002, loss=0.216]
Train Epoch #1:   9%|▉         | 3645/41598 [14:09<2:20:51,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3645, grad_norm=0.0977, lr=0.0002, loss=0.216]
Train Epoch #1:   9%|▉         | 3691/41598 [14:19<2:20:26,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3691, grad_norm=0.0968, lr=0.0002, loss=0.216]
Train Epoch #1:   9%|▉         | 3737/41598 [14:30<2:20:06,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3737, grad_norm=0.103, lr=0.0002, loss=0.215] 
Train Epoch #1:   9%|▉         | 3782/41598 [14:40<2:19:56,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3782, grad_norm=0.0863, lr=0.0002, loss=0.215]
Train Epoch #1:   9%|▉         | 3783/41598 [14:40<2:19:48,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3783, grad_norm=0.102, lr=0.0002, loss=0.215] 
Train Epoch #1:   9%|▉         | 3829/41598 [14:50<2:19:32,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3829, grad_norm=0.109, lr=0.0002, loss=0.215]
Train Epoch #1:   9%|▉         | 3875/41598 [15:00<2:19:17,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3875, grad_norm=0.112, lr=0.0002, loss=0.215]
Train Epoch #1:   9%|▉         | 3921/41598 [15:10<2:19:09,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3921, grad_norm=0.108, lr=0.0002, loss=0.215]
Train Epoch #1:  10%|▉         | 3967/41598 [15:20<2:18:57,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3967, grad_norm=0.0919, lr=0.0002, loss=0.215]
Train Epoch #1:  10%|▉         | 4000/41598 [15:40<2:18:49,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4000, grad_norm=0.105, lr=0.0002, loss=0.215] save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 4000

Train Epoch #1:  10%|▉         | 4001/41598 [15:41<3:18:46,  3.15it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4001, grad_norm=0.0955, lr=0.0002, loss=0.215]
Train Epoch #1:  10%|▉         | 4047/41598 [15:52<2:59:30,  3.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4047, grad_norm=0.0868, lr=0.0002, loss=0.215]
Train Epoch #1:  10%|▉         | 4093/41598 [16:02<2:46:33,  3.75it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4093, grad_norm=0.098, lr=0.0002, loss=0.214] 
Train Epoch #1:  10%|▉         | 4139/41598 [16:12<2:37:44,  3.96it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4139, grad_norm=0.0889, lr=0.0002, loss=0.214]
Train Epoch #1:  10%|█         | 4185/41598 [16:22<2:31:36,  4.11it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4185, grad_norm=0.0853, lr=0.0002, loss=0.214]
Train Epoch #1:  10%|█         | 4231/41598 [16:32<2:27:21,  4.23it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4231, grad_norm=0.0948, lr=0.0002, loss=0.214]
Train Epoch #1:  10%|█         | 4277/41598 [16:43<2:24:22,  4.31it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4277, grad_norm=0.0864, lr=0.0002, loss=0.214]
Train Epoch #1:  10%|█         | 4323/41598 [16:53<2:22:12,  4.37it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4323, grad_norm=0.0959, lr=0.0002, loss=0.214]
Train Epoch #1:  11%|█         | 4369/41598 [17:03<2:20:37,  4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4369, grad_norm=0.0887, lr=0.0002, loss=0.214]
Train Epoch #1:  11%|█         | 4415/41598 [17:13<2:19:31,  4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4415, grad_norm=0.0881, lr=0.0002, loss=0.214]
Train Epoch #1:  11%|█         | 4461/41598 [17:24<2:19:15,  4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4461, grad_norm=0.111, lr=0.0002, loss=0.214] 
Train Epoch #1:  11%|█         | 4507/41598 [17:34<2:18:23,  4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4507, grad_norm=0.108, lr=0.0002, loss=0.214]
Train Epoch #1:  11%|█         | 4553/41598 [17:44<2:17:46,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4553, grad_norm=0.111, lr=0.0002, loss=0.213]
Train Epoch #1:  11%|█         | 4599/41598 [17:54<2:17:17,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4599, grad_norm=0.102, lr=0.0002, loss=0.213]
Train Epoch #1:  11%|█         | 4645/41598 [18:04<2:16:52,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4645, grad_norm=0.0863, lr=0.0002, loss=0.213]
Train Epoch #1:  11%|█▏        | 4691/41598 [18:14<2:16:35,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4691, grad_norm=0.0907, lr=0.0002, loss=0.213]
Train Epoch #1:  11%|█▏        | 4737/41598 [18:25<2:16:18,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4737, grad_norm=0.0886, lr=0.0002, loss=0.213]
Train Epoch #1:  11%|█▏        | 4783/41598 [18:35<2:16:03,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4783, grad_norm=0.0805, lr=0.0002, loss=0.213]
Train Epoch #1:  12%|█▏        | 4829/41598 [18:45<2:16:21,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4829, grad_norm=0.0818, lr=0.0002, loss=0.213]
Train Epoch #1:  12%|█▏        | 4875/41598 [18:55<2:15:59,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4875, grad_norm=0.0849, lr=0.0002, loss=0.213]
Train Epoch #1:  12%|█▏        | 4921/41598 [19:06<2:15:43,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4921, grad_norm=0.0725, lr=0.0002, loss=0.213]
Train Epoch #1:  12%|█▏        | 4967/41598 [19:16<2:15:25,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4967, grad_norm=0.0815, lr=0.0002, loss=0.213]
Train Epoch #1:  12%|█▏        | 5000/41598 [19:30<2:15:18,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5000, grad_norm=0.0895, lr=0.0002, loss=0.213]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 5000

Train Epoch #1:  12%|█▏        | 5001/41598 [19:38<3:18:32,  3.07it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5001, grad_norm=0.11, lr=0.0002, loss=0.213]  
Train Epoch #1:  12%|█▏        | 5047/41598 [19:48<2:58:14,  3.42it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5047, grad_norm=0.1, lr=0.0002, loss=0.213] 
Train Epoch #1:  12%|█▏        | 5093/41598 [19:58<2:44:30,  3.70it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5093, grad_norm=0.0813, lr=0.0002, loss=0.213]
Train Epoch #1:  12%|█▏        | 5139/41598 [20:09<2:35:19,  3.91it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5139, grad_norm=0.099, lr=0.0002, loss=0.213] 
Train Epoch #1:  12%|█▏        | 5184/41598 [20:19<2:29:30,  4.06it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5184, grad_norm=0.104, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5230/41598 [20:29<2:24:41,  4.19it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5230, grad_norm=0.0829, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5276/41598 [20:39<2:21:26,  4.28it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5276, grad_norm=0.0901, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5322/41598 [20:49<2:19:05,  4.35it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5322, grad_norm=0.0725, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5368/41598 [20:59<2:17:21,  4.40it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5368, grad_norm=0.0764, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5413/41598 [21:10<2:17:11,  4.40it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5413, grad_norm=0.0762, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5414/41598 [21:10<2:16:08,  4.43it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5414, grad_norm=0.0881, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5460/41598 [21:20<2:15:14,  4.45it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5460, grad_norm=0.0783, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5506/41598 [21:30<2:14:31,  4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5506, grad_norm=0.0985, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5552/41598 [21:40<2:14:29,  4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5552, grad_norm=0.0746, lr=0.0002, loss=0.212]
Train Epoch #1:  13%|█▎        | 5598/41598 [21:51<2:13:53,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5598, grad_norm=0.089, lr=0.0002, loss=0.211] 
Train Epoch #1:  14%|█▎        | 5644/41598 [22:01<2:13:31,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5644, grad_norm=0.0981, lr=0.0002, loss=0.211]
Train Epoch #1:  14%|█▎        | 5690/41598 [22:11<2:13:07,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5690, grad_norm=0.083, lr=0.0002, loss=0.211] 
Train Epoch #1:  14%|█▍        | 5736/41598 [22:21<2:12:47,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5736, grad_norm=0.0793, lr=0.0002, loss=0.211]
Train Epoch #1:  14%|█▍        | 5782/41598 [22:31<2:12:28,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5782, grad_norm=0.0676, lr=0.0002, loss=0.211]
Train Epoch #1:  14%|█▍        | 5828/41598 [22:42<2:12:17,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5828, grad_norm=0.073, lr=0.0002, loss=0.211] 
Train Epoch #1:  14%|█▍        | 5874/41598 [22:52<2:12:06,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5874, grad_norm=0.0691, lr=0.0002, loss=0.211]
Train Epoch #1:  14%|█▍        | 5920/41598 [23:02<2:11:51,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5920, grad_norm=0.0707, lr=0.0002, loss=0.211]
Train Epoch #1:  14%|█▍        | 5966/41598 [23:12<2:12:10,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5966, grad_norm=0.0715, lr=0.0002, loss=0.211]
Train Epoch #1:  14%|█▍        | 6000/41598 [23:30<2:12:02,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6000, grad_norm=0.0748, lr=0.0002, loss=0.211]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 6000

Train Epoch #1:  14%|█▍        | 6001/41598 [23:35<3:13:47,  3.06it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6001, grad_norm=0.0764, lr=0.0002, loss=0.211]
Train Epoch #1:  15%|█▍        | 6047/41598 [23:45<2:53:53,  3.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6047, grad_norm=0.0847, lr=0.0002, loss=0.211]
Train Epoch #1:  15%|█▍        | 6093/41598 [23:55<2:40:26,  3.69it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6093, grad_norm=0.0682, lr=0.0002, loss=0.211]
Train Epoch #1:  15%|█▍        | 6139/41598 [24:05<2:31:18,  3.91it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6139, grad_norm=0.0916, lr=0.0002, loss=0.211]
Train Epoch #1:  15%|█▍        | 6185/41598 [24:16<2:24:54,  4.07it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6185, grad_norm=0.0722, lr=0.0002, loss=0.211]
Train Epoch #1:  15%|█▍        | 6231/41598 [24:26<2:20:27,  4.20it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6231, grad_norm=0.0691, lr=0.0002, loss=0.211]
Train Epoch #1:  15%|█▌        | 6277/41598 [24:36<2:17:19,  4.29it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6277, grad_norm=0.0788, lr=0.0002, loss=0.211]
Train Epoch #1:  15%|█▌        | 6323/41598 [24:46<2:15:05,  4.35it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6323, grad_norm=0.0717, lr=0.0002, loss=0.211]
Train Epoch #1:  15%|█▌        | 6369/41598 [24:57<2:14:05,  4.38it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6369, grad_norm=0.0665, lr=0.0002, loss=0.21] 
Train Epoch #1:  15%|█▌        | 6415/41598 [25:07<2:12:49,  4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6415, grad_norm=0.0733, lr=0.0002, loss=0.21]
Train Epoch #1:  16%|█▌        | 6461/41598 [25:17<2:11:50,  4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6461, grad_norm=0.0812, lr=0.0002, loss=0.21]
Train Epoch #1:  16%|█▌        | 6507/41598 [25:27<2:11:01,  4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6507, grad_norm=0.0779, lr=0.0002, loss=0.21]
Train Epoch #1:  16%|█▌        | 6553/41598 [25:37<2:10:27,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6553, grad_norm=0.082, lr=0.0002, loss=0.21] 
Train Epoch #1:  16%|█▌        | 6599/41598 [25:48<2:09:56,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6599, grad_norm=0.0662, lr=0.0002, loss=0.21]
Train Epoch #1:  16%|█▌        | 6645/41598 [25:58<2:09:36,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6645, grad_norm=0.0736, lr=0.0002, loss=0.21]
Train Epoch #1:  16%|█▌        | 6691/41598 [26:08<2:09:18,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6691, grad_norm=0.0649, lr=0.0002, loss=0.21]
Train Epoch #1:  16%|█▌        | 6737/41598 [26:18<2:09:00,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6737, grad_norm=0.0807, lr=0.0002, loss=0.21]
Train Epoch #1:  16%|█▋        | 6783/41598 [26:28<2:08:43,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6783, grad_norm=0.0769, lr=0.0002, loss=0.21]
Train Epoch #1:  16%|█▋        | 6829/41598 [26:39<2:09:04,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6829, grad_norm=0.0634, lr=0.0002, loss=0.21]
Train Epoch #1:  17%|█▋        | 6875/41598 [26:49<2:08:45,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6875, grad_norm=0.0655, lr=0.0002, loss=0.21]
Train Epoch #1:  17%|█▋        | 6921/41598 [26:59<2:08:24,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6921, grad_norm=0.0564, lr=0.0002, loss=0.21]
Train Epoch #1:  17%|█▋        | 6967/41598 [27:09<2:08:08,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6967, grad_norm=0.0604, lr=0.0002, loss=0.21]
Train Epoch #1:  17%|█▋        | 7000/41598 [27:20<2:08:01,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7000, grad_norm=0.0555, lr=0.0002, loss=0.21]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 7000

Train Epoch #1:  17%|█▋        | 7001/41598 [27:30<3:02:08,  3.17it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7001, grad_norm=0.0581, lr=0.0002, loss=0.21]
Train Epoch #1:  17%|█▋        | 7047/41598 [27:40<2:44:40,  3.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7047, grad_norm=0.0551, lr=0.0002, loss=0.21]
Train Epoch #1:  17%|█▋        | 7093/41598 [27:51<2:32:56,  3.76it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7093, grad_norm=0.0599, lr=0.0002, loss=0.209]
Train Epoch #1:  17%|█▋        | 7139/41598 [28:01<2:24:54,  3.96it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7139, grad_norm=0.0709, lr=0.0002, loss=0.209]
Train Epoch #1:  17%|█▋        | 7185/41598 [28:11<2:19:56,  4.10it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7185, grad_norm=0.0665, lr=0.0002, loss=0.209]
Train Epoch #1:  17%|█▋        | 7231/41598 [28:21<2:15:53,  4.21it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7231, grad_norm=0.0825, lr=0.0002, loss=0.209]
Train Epoch #1:  17%|█▋        | 7277/41598 [28:31<2:12:58,  4.30it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7277, grad_norm=0.056, lr=0.0002, loss=0.209] 
Train Epoch #1:  18%|█▊        | 7323/41598 [28:42<2:10:56,  4.36it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7323, grad_norm=0.0661, lr=0.0002, loss=0.209]
Train Epoch #1:  18%|█▊        | 7369/41598 [28:52<2:09:25,  4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7369, grad_norm=0.0523, lr=0.0002, loss=0.209]
Train Epoch #1:  18%|█▊        | 7415/41598 [29:02<2:08:21,  4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7415, grad_norm=0.0703, lr=0.0002, loss=0.209]
Train Epoch #1:  18%|█▊        | 7461/41598 [29:12<2:07:35,  4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7461, grad_norm=0.0575, lr=0.0002, loss=0.209]
Train Epoch #1:  18%|█▊        | 7507/41598 [29:22<2:06:59,  4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7507, grad_norm=0.0621, lr=0.0002, loss=0.209]
Train Epoch #1:  18%|█▊        | 7553/41598 [29:33<2:07:00,  4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7553, grad_norm=0.0572, lr=0.0002, loss=0.209]
Train Epoch #1:  18%|█▊        | 7599/41598 [29:43<2:06:28,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7599, grad_norm=0.0588, lr=0.0002, loss=0.209]
Train Epoch #1:  18%|█▊        | 7645/41598 [29:53<2:06:06,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7645, grad_norm=0.0514, lr=0.0002, loss=0.209]
Train Epoch #1:  18%|█▊        | 7691/41598 [30:03<2:05:45,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7691, grad_norm=0.0525, lr=0.0002, loss=0.209]
Train Epoch #1:  19%|█▊        | 7737/41598 [30:14<2:05:26,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7737, grad_norm=0.058, lr=0.0002, loss=0.209] 
Train Epoch #1:  19%|█▊        | 7783/41598 [30:24<2:05:12,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7783, grad_norm=0.0552, lr=0.0002, loss=0.209]
Train Epoch #1:  19%|█▉        | 7829/41598 [30:34<2:04:57,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7829, grad_norm=0.053, lr=0.0002, loss=0.209] 
Train Epoch #1:  19%|█▉        | 7875/41598 [30:44<2:04:41,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7875, grad_norm=0.0671, lr=0.0002, loss=0.209]
Train Epoch #1:  19%|█▉        | 7921/41598 [30:55<2:04:57,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7921, grad_norm=0.0586, lr=0.0002, loss=0.209]
Train Epoch #1:  19%|█▉        | 7967/41598 [31:05<2:04:36,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7967, grad_norm=0.0567, lr=0.0002, loss=0.209]
Train Epoch #1:  19%|█▉        | 8000/41598 [31:20<2:04:29,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8000, grad_norm=0.0577, lr=0.0002, loss=0.209]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 8000

Train Epoch #1:  19%|█▉        | 8001/41598 [31:26<2:58:02,  3.14it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8001, grad_norm=0.0592, lr=0.0002, loss=0.209]
Train Epoch #1:  19%|█▉        | 8047/41598 [31:36<2:40:41,  3.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8047, grad_norm=0.0468, lr=0.0002, loss=0.209]
Train Epoch #1:  19%|█▉        | 8093/41598 [31:46<2:28:57,  3.75it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8093, grad_norm=0.058, lr=0.0002, loss=0.208] 
Train Epoch #1:  20%|█▉        | 8139/41598 [31:56<2:21:02,  3.95it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8139, grad_norm=0.0472, lr=0.0002, loss=0.208]
Train Epoch #1:  20%|█▉        | 8185/41598 [32:07<2:15:30,  4.11it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8185, grad_norm=0.0571, lr=0.0002, loss=0.208]
Train Epoch #1:  20%|█▉        | 8231/41598 [32:17<2:11:39,  4.22it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8231, grad_norm=0.046, lr=0.0002, loss=0.208] 
Train Epoch #1:  20%|█▉        | 8277/41598 [32:27<2:09:24,  4.29it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8277, grad_norm=0.0581, lr=0.0002, loss=0.208]
Train Epoch #1:  20%|██        | 8323/41598 [32:37<2:07:17,  4.36it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8323, grad_norm=0.0559, lr=0.0002, loss=0.208]
Train Epoch #1:  20%|██        | 8369/41598 [32:47<2:05:48,  4.40it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8369, grad_norm=0.0488, lr=0.0002, loss=0.208]
Train Epoch #1:  20%|██        | 8415/41598 [32:58<2:04:42,  4.43it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8415, grad_norm=0.0548, lr=0.0002, loss=0.208]
Train Epoch #1:  20%|██        | 8461/41598 [33:08<2:03:51,  4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8461, grad_norm=0.0602, lr=0.0002, loss=0.208]
Train Epoch #1:  20%|██        | 8507/41598 [33:18<2:03:12,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8507, grad_norm=0.0558, lr=0.0002, loss=0.208]
Train Epoch #1:  21%|██        | 8553/41598 [33:28<2:02:49,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8553, grad_norm=0.0524, lr=0.0002, loss=0.208]
Train Epoch #1:  21%|██        | 8599/41598 [33:39<2:02:45,  4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8599, grad_norm=0.0641, lr=0.0002, loss=0.208]
Train Epoch #1:  21%|██        | 8645/41598 [33:49<2:02:21,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8645, grad_norm=0.055, lr=0.0002, loss=0.208] 
Train Epoch #1:  21%|██        | 8691/41598 [33:59<2:02:02,  4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8691, grad_norm=0.0491, lr=0.0002, loss=0.208]
Train Epoch #1:  21%|██        | 8737/41598 [34:09<2:01:45,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8737, grad_norm=0.0527, lr=0.0002, loss=0.208]
Train Epoch #1:  21%|██        | 8783/41598 [34:19<2:01:28,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8783, grad_norm=0.0427, lr=0.0002, loss=0.208]
Train Epoch #1:  21%|██        | 8829/41598 [34:30<2:01:14,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8829, grad_norm=0.0502, lr=0.0002, loss=0.208]
Train Epoch #1:  21%|██▏       | 8874/41598 [34:40<2:01:04,  4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8874, grad_norm=0.0478, lr=0.0002, loss=0.208]
Train Epoch #1:  21%|██▏       | 8875/41598 [34:40<2:00:59,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8875, grad_norm=0.0445, lr=0.0002, loss=0.208]
Train Epoch #1:  21%|██▏       | 8921/41598 [34:50<2:00:46,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8921, grad_norm=0.0485, lr=0.0002, loss=0.208]
Train Epoch #1:  22%|██▏       | 8967/41598 [35:00<2:00:34,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8967, grad_norm=0.0527, lr=0.0002, loss=0.208]
Train Epoch #1:  22%|██▏       | 9000/41598 [35:20<2:00:26,  4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=9000, grad_norm=0.0529, lr=0.0002, loss=0.208]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 9000

Train Epoch #1:  22%|██▏       | 9001/41598 [35:22<2:53:55,  3.12it/s, shape=torch.Size([32, 32, 16, 16]), global_step=9001, grad_norm=0.0566, lr=0.0002, loss=0.208]run_dir: .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16loading checkpoint .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt
optimizer loaded
scaler loaded
epoch: 0
global_step=9000
lr scheduler loaded
train generator state loaded
torch rng state loaded
torch cuda rng state loaded
best_fid=inf
checkpoint .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt loaded

Train Epoch #1:   0%|          | 0/10399 [00:00<?, ?it/s]skipping first 9000 steps
Train Epoch #1:  87%|████████▋ | 9013/10399 [00:10<00:01, 871.89it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9013, grad_norm=0.025, lr=0.0002, loss=0.205]
Train Epoch #1:  87%|████████▋ | 9036/10399 [00:25<00:01, 871.89it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9036, grad_norm=0.0208, lr=0.0002, loss=0.205]
Train Epoch #1:  87%|████████▋ | 9037/10399 [00:25<00:04, 277.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9037, grad_norm=0.0244, lr=0.0002, loss=0.204]
Train Epoch #1:  87%|████████▋ | 9053/10399 [00:36<00:07, 168.68it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9053, grad_norm=0.0218, lr=0.0002, loss=0.205]
Train Epoch #1:  87%|████████▋ | 9069/10399 [00:46<00:12, 108.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9069, grad_norm=0.0193, lr=0.0002, loss=0.202]
Train Epoch #1:  87%|████████▋ | 9085/10399 [00:57<00:18, 72.10it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9085, grad_norm=0.0218, lr=0.0002, loss=0.201] 
Train Epoch #1:  88%|████████▊ | 9101/10399 [01:07<00:26, 49.03it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9101, grad_norm=0.0216, lr=0.0002, loss=0.197]
Train Epoch #1:  88%|████████▊ | 9117/10399 [01:17<00:37, 33.89it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9117, grad_norm=0.0203, lr=0.0002, loss=0.196]
Train Epoch #1:  88%|████████▊ | 9133/10399 [01:28<00:53, 23.77it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9133, grad_norm=0.0196, lr=0.0002, loss=0.196]
Train Epoch #1:  88%|████████▊ | 9149/10399 [01:38<01:13, 16.90it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9149, grad_norm=0.023, lr=0.0002, loss=0.197] 
Train Epoch #1:  88%|████████▊ | 9165/10399 [01:49<01:41, 12.19it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9165, grad_norm=0.0212, lr=0.0002, loss=0.198]
Train Epoch #1:  88%|████████▊ | 9181/10399 [01:59<02:16,  8.95it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9181, grad_norm=0.0199, lr=0.0002, loss=0.198]
Train Epoch #1:  88%|████████▊ | 9196/10399 [02:10<02:14,  8.95it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9196, grad_norm=0.0225, lr=0.0002, loss=0.198]
Train Epoch #1:  88%|████████▊ | 9197/10399 [02:10<02:59,  6.70it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9197, grad_norm=0.0209, lr=0.0002, loss=0.198]
Train Epoch #1:  89%|████████▊ | 9213/10399 [02:20<03:50,  5.14it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9213, grad_norm=0.0209, lr=0.0002, loss=0.198]
Train Epoch #1:  89%|████████▊ | 9229/10399 [02:30<04:48,  4.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9229, grad_norm=0.0215, lr=0.0002, loss=0.199]
Train Epoch #1:  89%|████████▉ | 9245/10399 [02:41<05:50,  3.30it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9245, grad_norm=0.0208, lr=0.0002, loss=0.198]
Train Epoch #1:  89%|████████▉ | 9261/10399 [02:51<06:51,  2.77it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9261, grad_norm=0.0211, lr=0.0002, loss=0.197]
Train Epoch #1:  89%|████████▉ | 9277/10399 [03:02<07:48,  2.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9277, grad_norm=0.0231, lr=0.0002, loss=0.197]
Train Epoch #1:  89%|████████▉ | 9293/10399 [03:12<08:37,  2.14it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9293, grad_norm=0.0251, lr=0.0002, loss=0.197]
Train Epoch #1:  90%|████████▉ | 9309/10399 [03:23<09:17,  1.95it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9309, grad_norm=0.0227, lr=0.0002, loss=0.198]
Train Epoch #1:  90%|████████▉ | 9325/10399 [03:33<09:47,  1.83it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9325, grad_norm=0.021, lr=0.0002, loss=0.198] 
Train Epoch #1:  90%|████████▉ | 9341/10399 [03:44<10:08,  1.74it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9341, grad_norm=0.0221, lr=0.0002, loss=0.198]
Train Epoch #1:  90%|████████▉ | 9357/10399 [03:54<10:21,  1.68it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9357, grad_norm=0.0223, lr=0.0002, loss=0.198]
Train Epoch #1:  90%|█████████ | 9373/10399 [04:05<10:30,  1.63it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9373, grad_norm=0.0207, lr=0.0002, loss=0.198]
Train Epoch #1:  90%|█████████ | 9389/10399 [04:15<10:31,  1.60it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9389, grad_norm=0.0208, lr=0.0002, loss=0.198]
Train Epoch #1:  90%|█████████ | 9404/10399 [04:25<10:21,  1.60it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9404, grad_norm=0.0214, lr=0.0002, loss=0.198]
Train Epoch #1:  90%|█████████ | 9405/10399 [04:25<10:29,  1.58it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9405, grad_norm=0.0263, lr=0.0002, loss=0.198]
Train Epoch #1:  91%|█████████ | 9421/10399 [04:36<10:24,  1.57it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9421, grad_norm=0.0238, lr=0.0002, loss=0.198]
Train Epoch #1:  91%|█████████ | 9437/10399 [04:46<10:18,  1.56it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9437, grad_norm=0.0239, lr=0.0002, loss=0.198]
Train Epoch #1:  91%|█████████ | 9453/10399 [04:57<10:10,  1.55it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9453, grad_norm=0.0219, lr=0.0002, loss=0.198]
Train Epoch #1:  91%|█████████ | 9469/10399 [05:07<10:02,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9469, grad_norm=0.0208, lr=0.0002, loss=0.198]
Train Epoch #1:  91%|█████████ | 9485/10399 [05:18<09:53,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9485, grad_norm=0.0222, lr=0.0002, loss=0.197]
Train Epoch #1:  91%|█████████▏| 9501/10399 [05:28<09:43,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9501, grad_norm=0.0239, lr=0.0002, loss=0.198]
Train Epoch #1:  92%|█████████▏| 9517/10399 [05:38<09:33,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9517, grad_norm=0.0226, lr=0.0002, loss=0.198]
Train Epoch #1:  92%|█████████▏| 9533/10399 [05:49<09:23,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9533, grad_norm=0.0206, lr=0.0002, loss=0.198]
Train Epoch #1:  92%|█████████▏| 9549/10399 [05:59<09:13,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9549, grad_norm=0.0228, lr=0.0002, loss=0.198]
Train Epoch #1:  92%|█████████▏| 9564/10399 [06:10<09:03,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9564, grad_norm=0.0222, lr=0.0002, loss=0.198]
Train Epoch #1:  92%|█████████▏| 9565/10399 [06:10<09:03,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9565, grad_norm=0.0205, lr=0.0002, loss=0.198]
Train Epoch #1:  92%|█████████▏| 9581/10399 [06:20<08:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9581, grad_norm=0.02, lr=0.0002, loss=0.198]  
Train Epoch #1:  92%|█████████▏| 9597/10399 [06:31<08:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9597, grad_norm=0.0215, lr=0.0002, loss=0.198]
Train Epoch #1:  92%|█████████▏| 9613/10399 [06:41<08:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9613, grad_norm=0.0196, lr=0.0002, loss=0.198]
Train Epoch #1:  93%|█████████▎| 9629/10399 [06:51<08:22,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9629, grad_norm=0.0196, lr=0.0002, loss=0.198]
Train Epoch #1:  93%|█████████▎| 9645/10399 [07:02<08:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9645, grad_norm=0.0267, lr=0.0002, loss=0.198]
Train Epoch #1:  93%|█████████▎| 9661/10399 [07:12<08:01,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9661, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1:  93%|█████████▎| 9677/10399 [07:23<07:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9677, grad_norm=0.0216, lr=0.0002, loss=0.198]
Train Epoch #1:  93%|█████████▎| 9693/10399 [07:33<07:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9693, grad_norm=0.0214, lr=0.0002, loss=0.198]
Train Epoch #1:  93%|█████████▎| 9709/10399 [07:44<07:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9709, grad_norm=0.0246, lr=0.0002, loss=0.198]
Train Epoch #1:  94%|█████████▎| 9725/10399 [07:54<07:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9725, grad_norm=0.0205, lr=0.0002, loss=0.198]
Train Epoch #1:  94%|█████████▎| 9741/10399 [08:05<07:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9741, grad_norm=0.0241, lr=0.0002, loss=0.198]
Train Epoch #1:  94%|█████████▍| 9757/10399 [08:15<06:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9757, grad_norm=0.0201, lr=0.0002, loss=0.198]
Train Epoch #1:  94%|█████████▍| 9772/10399 [08:25<06:49,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9772, grad_norm=0.0214, lr=0.0002, loss=0.197]
Train Epoch #1:  94%|█████████▍| 9773/10399 [08:25<06:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9773, grad_norm=0.0241, lr=0.0002, loss=0.197]
Train Epoch #1:  94%|█████████▍| 9789/10399 [08:36<06:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9789, grad_norm=0.0222, lr=0.0002, loss=0.197]
Train Epoch #1:  94%|█████████▍| 9805/10399 [08:46<06:28,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9805, grad_norm=0.025, lr=0.0002, loss=0.197] 
Train Epoch #1:  94%|█████████▍| 9821/10399 [08:57<06:17,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9821, grad_norm=0.0206, lr=0.0002, loss=0.197]
Train Epoch #1:  95%|█████████▍| 9837/10399 [09:07<06:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9837, grad_norm=0.0255, lr=0.0002, loss=0.197]
Train Epoch #1:  95%|█████████▍| 9853/10399 [09:18<05:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9853, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #1:  95%|█████████▍| 9869/10399 [09:28<05:45,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9869, grad_norm=0.02, lr=0.0002, loss=0.198]  
Train Epoch #1:  95%|█████████▌| 9885/10399 [09:39<05:35,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9885, grad_norm=0.0209, lr=0.0002, loss=0.198]
Train Epoch #1:  95%|█████████▌| 9901/10399 [09:49<05:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9901, grad_norm=0.0239, lr=0.0002, loss=0.198]
Train Epoch #1:  95%|█████████▌| 9917/10399 [09:59<05:14,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9917, grad_norm=0.0241, lr=0.0002, loss=0.198]
Train Epoch #1:  96%|█████████▌| 9932/10399 [10:10<05:04,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9932, grad_norm=0.0203, lr=0.0002, loss=0.198]
Train Epoch #1:  96%|█████████▌| 9933/10399 [10:10<05:03,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9933, grad_norm=0.0207, lr=0.0002, loss=0.198]
Train Epoch #1:  96%|█████████▌| 9949/10399 [10:20<04:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9949, grad_norm=0.0219, lr=0.0002, loss=0.198]
Train Epoch #1:  96%|█████████▌| 9965/10399 [10:31<04:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9965, grad_norm=0.0222, lr=0.0002, loss=0.198]
Train Epoch #1:  96%|█████████▌| 9981/10399 [10:41<04:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9981, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1:  96%|█████████▌| 9997/10399 [10:52<04:22,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9997, grad_norm=0.0215, lr=0.0002, loss=0.198]
Train Epoch #1:  96%|█████████▌| 10000/10399 [11:05<04:20,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.0221, lr=0.0002, loss=0.198]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 10000

Train Epoch #1:  96%|█████████▌| 10001/10399 [11:08<06:30,  1.02it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.0236, lr=0.0002, loss=0.198]
Train Epoch #1:  96%|█████████▋| 10017/10399 [11:18<05:30,  1.16it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.0199, lr=0.0002, loss=0.198]
Train Epoch #1:  96%|█████████▋| 10033/10399 [11:29<04:49,  1.26it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.021, lr=0.0002, loss=0.198] 
Train Epoch #1:  97%|█████████▋| 10049/10399 [11:39<04:21,  1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.0242, lr=0.0002, loss=0.198]
Train Epoch #1:  97%|█████████▋| 10064/10399 [11:50<04:10,  1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10064, grad_norm=0.0209, lr=0.0002, loss=0.198]
Train Epoch #1:  97%|█████████▋| 10065/10399 [11:50<03:59,  1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10065, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1:  97%|█████████▋| 10081/10399 [12:00<03:41,  1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10081, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1:  97%|█████████▋| 10097/10399 [12:10<03:26,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10097, grad_norm=0.0224, lr=0.0002, loss=0.198]
Train Epoch #1:  97%|█████████▋| 10113/10399 [12:21<03:12,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10113, grad_norm=0.0186, lr=0.0002, loss=0.198]
Train Epoch #1:  97%|█████████▋| 10129/10399 [12:31<03:00,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10129, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1:  98%|█████████▊| 10145/10399 [12:42<02:48,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10145, grad_norm=0.0211, lr=0.0002, loss=0.198]
Train Epoch #1:  98%|█████████▊| 10161/10399 [12:52<02:37,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10161, grad_norm=0.0245, lr=0.0002, loss=0.198]
Train Epoch #1:  98%|█████████▊| 10177/10399 [13:03<02:26,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10177, grad_norm=0.0256, lr=0.0002, loss=0.198]
Train Epoch #1:  98%|█████████▊| 10193/10399 [13:13<02:15,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10193, grad_norm=0.0213, lr=0.0002, loss=0.198]
Train Epoch #1:  98%|█████████▊| 10209/10399 [13:24<02:04,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10209, grad_norm=0.0218, lr=0.0002, loss=0.198]
Train Epoch #1:  98%|█████████▊| 10225/10399 [13:34<01:54,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10225, grad_norm=0.0242, lr=0.0002, loss=0.198]
Train Epoch #1:  98%|█████████▊| 10241/10399 [13:45<01:43,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10241, grad_norm=0.0226, lr=0.0002, loss=0.198]
Train Epoch #1:  99%|█████████▊| 10257/10399 [13:55<01:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10257, grad_norm=0.0196, lr=0.0002, loss=0.198]
Train Epoch #1:  99%|█████████▉| 10272/10399 [14:05<01:23,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10272, grad_norm=0.0201, lr=0.0002, loss=0.198]
Train Epoch #1:  99%|█████████▉| 10273/10399 [14:05<01:22,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10273, grad_norm=0.0217, lr=0.0002, loss=0.198]
Train Epoch #1:  99%|█████████▉| 10289/10399 [14:16<01:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10289, grad_norm=0.0251, lr=0.0002, loss=0.198]
Train Epoch #1:  99%|█████████▉| 10305/10399 [14:26<01:01,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10305, grad_norm=0.021, lr=0.0002, loss=0.198] 
Train Epoch #1:  99%|█████████▉| 10321/10399 [14:37<00:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10321, grad_norm=0.021, lr=0.0002, loss=0.198]
Train Epoch #1:  99%|█████████▉| 10337/10399 [14:47<00:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10337, grad_norm=0.022, lr=0.0002, loss=0.198]
Train Epoch #1: 100%|█████████▉| 10353/10399 [14:58<00:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10353, grad_norm=0.021, lr=0.0002, loss=0.198]
Train Epoch #1: 100%|█████████▉| 10369/10399 [15:08<00:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10369, grad_norm=0.021, lr=0.0002, loss=0.198]
Train Epoch #1: 100%|█████████▉| 10385/10399 [15:19<00:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10385, grad_norm=0.0207, lr=0.0002, loss=0.198]
Train Epoch #1: 100%|██████████| 10399/10399 [15:28<00:00, 11.20it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10399, grad_norm=0.0234, lr=0.0002, loss=0.198]
train info dict: {'loss': 0.19794975221157074}

Train Epoch #2:   0%|          | 0/10399 [00:00<?, ?it/s]
Train Epoch #2:   0%|          | 14/10399 [00:10<2:04:09,  1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10413, grad_norm=0.0194, lr=0.0002, loss=0.198]
Train Epoch #2:   0%|          | 30/10399 [00:20<1:56:59,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10429, grad_norm=0.0242, lr=0.0002, loss=0.194]
Train Epoch #2:   0%|          | 46/10399 [00:30<1:54:50,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10445, grad_norm=0.0251, lr=0.0002, loss=0.195]
Train Epoch #2:   1%|          | 62/10399 [00:41<1:53:46,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10461, grad_norm=0.0221, lr=0.0002, loss=0.197]
Train Epoch #2:   1%|          | 77/10399 [00:51<1:53:36,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10476, grad_norm=0.0214, lr=0.0002, loss=0.198]
Train Epoch #2:   1%|          | 78/10399 [00:51<1:53:05,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10477, grad_norm=0.0234, lr=0.0002, loss=0.198]
Train Epoch #2:   1%|          | 94/10399 [01:02<1:52:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10493, grad_norm=0.0214, lr=0.0002, loss=0.198]
Train Epoch #2:   1%|          | 110/10399 [01:12<1:52:14,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10509, grad_norm=0.0231, lr=0.0002, loss=0.199]
Train Epoch #2:   1%|          | 126/10399 [01:23<1:51:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10525, grad_norm=0.0227, lr=0.0002, loss=0.197]
Train Epoch #2:   1%|▏         | 142/10399 [01:33<1:51:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10541, grad_norm=0.0235, lr=0.0002, loss=0.198]
Train Epoch #2:   2%|▏         | 158/10399 [01:43<1:51:28,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10557, grad_norm=0.0214, lr=0.0002, loss=0.199]
Train Epoch #2:   2%|▏         | 174/10399 [01:54<1:51:16,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10573, grad_norm=0.0227, lr=0.0002, loss=0.199]
Train Epoch #2:   2%|▏         | 190/10399 [02:05<1:51:40,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10589, grad_norm=0.0255, lr=0.0002, loss=0.199]
Train Epoch #2:   2%|▏         | 206/10399 [02:15<1:51:17,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10605, grad_norm=0.0216, lr=0.0002, loss=0.198]
Train Epoch #2:   2%|▏         | 222/10399 [02:25<1:50:59,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10621, grad_norm=0.0203, lr=0.0002, loss=0.197]
Train Epoch #2:   2%|▏         | 238/10399 [02:36<1:50:44,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10637, grad_norm=0.024, lr=0.0002, loss=0.197] 
Train Epoch #2:   2%|▏         | 254/10399 [02:46<1:50:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10653, grad_norm=0.0212, lr=0.0002, loss=0.197]
Train Epoch #2:   3%|▎         | 269/10399 [02:57<1:50:20,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10668, grad_norm=0.0204, lr=0.0002, loss=0.197]
Train Epoch #2:   3%|▎         | 270/10399 [02:57<1:50:18,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10669, grad_norm=0.0216, lr=0.0002, loss=0.197]
Train Epoch #2:   3%|▎         | 286/10399 [03:07<1:50:05,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10685, grad_norm=0.0223, lr=0.0002, loss=0.197]
Train Epoch #2:   3%|▎         | 302/10399 [03:18<1:49:54,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10701, grad_norm=0.0207, lr=0.0002, loss=0.197]
Train Epoch #2:   3%|▎         | 318/10399 [03:28<1:49:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10717, grad_norm=0.0217, lr=0.0002, loss=0.197]
Train Epoch #2:   3%|▎         | 334/10399 [03:39<1:49:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10733, grad_norm=0.0198, lr=0.0002, loss=0.196]
Train Epoch #2:   3%|▎         | 350/10399 [03:49<1:49:20,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10749, grad_norm=0.0207, lr=0.0002, loss=0.196]
Train Epoch #2:   4%|▎         | 366/10399 [03:59<1:49:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10765, grad_norm=0.0221, lr=0.0002, loss=0.196]
Train Epoch #2:   4%|▎         | 382/10399 [04:10<1:49:01,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10781, grad_norm=0.02, lr=0.0002, loss=0.196]  
Train Epoch #2:   4%|▍         | 398/10399 [04:20<1:48:49,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10797, grad_norm=0.023, lr=0.0002, loss=0.196]
Train Epoch #2:   4%|▍         | 414/10399 [04:31<1:48:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10813, grad_norm=0.0215, lr=0.0002, loss=0.196]
Train Epoch #2:   4%|▍         | 429/10399 [04:41<1:48:28,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10828, grad_norm=0.0212, lr=0.0002, loss=0.196]
Train Epoch #2:   4%|▍         | 430/10399 [04:41<1:48:27,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10829, grad_norm=0.0227, lr=0.0002, loss=0.196]
Train Epoch #2:   4%|▍         | 446/10399 [04:52<1:48:17,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10845, grad_norm=0.0214, lr=0.0002, loss=0.196]
Train Epoch #2:   4%|▍         | 462/10399 [05:02<1:48:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10861, grad_norm=0.0231, lr=0.0002, loss=0.196]
Train Epoch #2:   5%|▍         | 478/10399 [05:13<1:47:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10877, grad_norm=0.0183, lr=0.0002, loss=0.196]
Train Epoch #2:   5%|▍         | 494/10399 [05:23<1:47:46,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10893, grad_norm=0.0192, lr=0.0002, loss=0.196]
Train Epoch #2:   5%|▍         | 510/10399 [05:33<1:47:35,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10909, grad_norm=0.0197, lr=0.0002, loss=0.196]
Train Epoch #2:   5%|▌         | 526/10399 [05:44<1:47:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10925, grad_norm=0.0211, lr=0.0002, loss=0.196]
Train Epoch #2:   5%|▌         | 542/10399 [05:54<1:47:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10941, grad_norm=0.02, lr=0.0002, loss=0.196]  
Train Epoch #2:   5%|▌         | 558/10399 [06:05<1:47:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10957, grad_norm=0.0204, lr=0.0002, loss=0.196]
Train Epoch #2:   6%|▌         | 574/10399 [06:15<1:47:10,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10973, grad_norm=0.0207, lr=0.0002, loss=0.196]
Train Epoch #2:   6%|▌         | 590/10399 [06:26<1:46:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10989, grad_norm=0.0198, lr=0.0002, loss=0.196]
Train Epoch #2:   6%|▌         | 601/10399 [06:37<1:46:46,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11000, grad_norm=0.0216, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 11000

Train Epoch #2:   6%|▌         | 602/10399 [06:47<2:32:25,  1.07it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11001, grad_norm=0.023, lr=0.0002, loss=0.196] 
Train Epoch #2:   6%|▌         | 618/10399 [06:58<2:17:34,  1.18it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11017, grad_norm=0.0203, lr=0.0002, loss=0.197]
Train Epoch #2:   6%|▌         | 634/10399 [07:08<2:07:32,  1.28it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11033, grad_norm=0.0191, lr=0.0002, loss=0.197]
Train Epoch #2:   6%|▋         | 650/10399 [07:19<2:00:42,  1.35it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11049, grad_norm=0.0249, lr=0.0002, loss=0.196]
Train Epoch #2:   6%|▋         | 666/10399 [07:29<1:55:59,  1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11065, grad_norm=0.0193, lr=0.0002, loss=0.197]
Train Epoch #2:   7%|▋         | 682/10399 [07:40<1:52:42,  1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11081, grad_norm=0.0222, lr=0.0002, loss=0.197]
Train Epoch #2:   7%|▋         | 698/10399 [07:50<1:50:20,  1.47it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11097, grad_norm=0.0194, lr=0.0002, loss=0.197]
Train Epoch #2:   7%|▋         | 714/10399 [08:00<1:48:39,  1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11113, grad_norm=0.0243, lr=0.0002, loss=0.197]
Train Epoch #2:   7%|▋         | 730/10399 [08:11<1:47:26,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11129, grad_norm=0.0194, lr=0.0002, loss=0.197]
Train Epoch #2:   7%|▋         | 745/10399 [08:21<1:47:16,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11144, grad_norm=0.022, lr=0.0002, loss=0.197] 
Train Epoch #2:   7%|▋         | 746/10399 [08:21<1:46:31,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11145, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2:   7%|▋         | 762/10399 [08:32<1:45:51,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11161, grad_norm=0.0206, lr=0.0002, loss=0.197]
Train Epoch #2:   7%|▋         | 778/10399 [08:42<1:45:21,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11177, grad_norm=0.0206, lr=0.0002, loss=0.197]
Train Epoch #2:   8%|▊         | 794/10399 [08:53<1:44:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11193, grad_norm=0.0219, lr=0.0002, loss=0.197]
Train Epoch #2:   8%|▊         | 810/10399 [09:03<1:44:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11209, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2:   8%|▊         | 826/10399 [09:13<1:44:23,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11225, grad_norm=0.023, lr=0.0002, loss=0.197] 
Train Epoch #2:   8%|▊         | 842/10399 [09:24<1:44:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11241, grad_norm=0.0216, lr=0.0002, loss=0.197]
Train Epoch #2:   8%|▊         | 858/10399 [09:34<1:43:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11257, grad_norm=0.0213, lr=0.0002, loss=0.197]
Train Epoch #2:   8%|▊         | 874/10399 [09:45<1:43:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11273, grad_norm=0.0214, lr=0.0002, loss=0.197]
Train Epoch #2:   9%|▊         | 890/10399 [09:55<1:43:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11289, grad_norm=0.0205, lr=0.0002, loss=0.197]
Train Epoch #2:   9%|▊         | 906/10399 [10:06<1:43:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11305, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2:   9%|▉         | 922/10399 [10:16<1:43:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11321, grad_norm=0.0204, lr=0.0002, loss=0.197]
Train Epoch #2:   9%|▉         | 938/10399 [10:27<1:43:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11337, grad_norm=0.0193, lr=0.0002, loss=0.197]
Train Epoch #2:   9%|▉         | 953/10399 [10:37<1:42:57,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11352, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2:   9%|▉         | 954/10399 [10:37<1:42:51,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11353, grad_norm=0.017, lr=0.0002, loss=0.197] 
Train Epoch #2:   9%|▉         | 970/10399 [10:48<1:42:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11369, grad_norm=0.0204, lr=0.0002, loss=0.196]
Train Epoch #2:   9%|▉         | 986/10399 [10:58<1:42:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11385, grad_norm=0.02, lr=0.0002, loss=0.196]  
Train Epoch #2:  10%|▉         | 1002/10399 [11:08<1:42:12,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11401, grad_norm=0.0227, lr=0.0002, loss=0.197]
Train Epoch #2:  10%|▉         | 1018/10399 [11:19<1:42:01,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11417, grad_norm=0.0205, lr=0.0002, loss=0.197]
Train Epoch #2:  10%|▉         | 1034/10399 [11:29<1:41:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11433, grad_norm=0.0214, lr=0.0002, loss=0.197]
Train Epoch #2:  10%|█         | 1050/10399 [11:40<1:41:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11449, grad_norm=0.019, lr=0.0002, loss=0.197] 
Train Epoch #2:  10%|█         | 1066/10399 [11:50<1:41:29,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11465, grad_norm=0.0195, lr=0.0002, loss=0.197]
Train Epoch #2:  10%|█         | 1082/10399 [12:01<1:41:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11481, grad_norm=0.0213, lr=0.0002, loss=0.197]
Train Epoch #2:  11%|█         | 1098/10399 [12:11<1:41:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11497, grad_norm=0.0185, lr=0.0002, loss=0.197]
Train Epoch #2:  11%|█         | 1113/10399 [12:21<1:40:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11512, grad_norm=0.0187, lr=0.0002, loss=0.197]
Train Epoch #2:  11%|█         | 1114/10399 [12:21<1:40:55,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11513, grad_norm=0.0182, lr=0.0002, loss=0.197]
Train Epoch #2:  11%|█         | 1130/10399 [12:32<1:40:45,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11529, grad_norm=0.0201, lr=0.0002, loss=0.197]
Train Epoch #2:  11%|█         | 1146/10399 [12:42<1:40:35,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11545, grad_norm=0.019, lr=0.0002, loss=0.197] 
Train Epoch #2:  11%|█         | 1162/10399 [12:53<1:40:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11561, grad_norm=0.0199, lr=0.0002, loss=0.197]
Train Epoch #2:  11%|█▏        | 1178/10399 [13:03<1:40:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11577, grad_norm=0.0186, lr=0.0002, loss=0.197]
Train Epoch #2:  11%|█▏        | 1194/10399 [13:14<1:40:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11593, grad_norm=0.0201, lr=0.0002, loss=0.197]
Train Epoch #2:  12%|█▏        | 1210/10399 [13:24<1:39:57,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11609, grad_norm=0.0235, lr=0.0002, loss=0.197]
Train Epoch #2:  12%|█▏        | 1226/10399 [13:35<1:39:46,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11625, grad_norm=0.0183, lr=0.0002, loss=0.197]
Train Epoch #2:  12%|█▏        | 1242/10399 [13:45<1:39:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11641, grad_norm=0.0214, lr=0.0002, loss=0.197]
Train Epoch #2:  12%|█▏        | 1258/10399 [13:55<1:39:26,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11657, grad_norm=0.0215, lr=0.0002, loss=0.197]
Train Epoch #2:  12%|█▏        | 1274/10399 [14:06<1:39:14,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11673, grad_norm=0.0191, lr=0.0002, loss=0.197]
Train Epoch #2:  12%|█▏        | 1290/10399 [14:16<1:39:03,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11689, grad_norm=0.0193, lr=0.0002, loss=0.197]
Train Epoch #2:  13%|█▎        | 1305/10399 [14:27<1:38:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11704, grad_norm=0.0163, lr=0.0002, loss=0.197]
Train Epoch #2:  13%|█▎        | 1306/10399 [14:27<1:38:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11705, grad_norm=0.0188, lr=0.0002, loss=0.197]
Train Epoch #2:  13%|█▎        | 1322/10399 [14:37<1:38:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11721, grad_norm=0.0194, lr=0.0002, loss=0.197]
Train Epoch #2:  13%|█▎        | 1338/10399 [14:48<1:38:55,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11737, grad_norm=0.0196, lr=0.0002, loss=0.197]
Train Epoch #2:  13%|█▎        | 1354/10399 [14:58<1:38:37,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11753, grad_norm=0.0199, lr=0.0002, loss=0.197]
Train Epoch #2:  13%|█▎        | 1370/10399 [15:09<1:38:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11769, grad_norm=0.0187, lr=0.0002, loss=0.197]
Train Epoch #2:  13%|█▎        | 1386/10399 [15:19<1:38:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11785, grad_norm=0.0213, lr=0.0002, loss=0.197]
Train Epoch #2:  13%|█▎        | 1402/10399 [15:30<1:37:55,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11801, grad_norm=0.0197, lr=0.0002, loss=0.197]
Train Epoch #2:  14%|█▎        | 1418/10399 [15:40<1:37:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11817, grad_norm=0.0171, lr=0.0002, loss=0.197]
Train Epoch #2:  14%|█▍        | 1434/10399 [15:50<1:37:29,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11833, grad_norm=0.0183, lr=0.0002, loss=0.197]
Train Epoch #2:  14%|█▍        | 1450/10399 [16:01<1:37:18,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11849, grad_norm=0.0186, lr=0.0002, loss=0.197]
Train Epoch #2:  14%|█▍        | 1465/10399 [16:11<1:37:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11864, grad_norm=0.0199, lr=0.0002, loss=0.197]
Train Epoch #2:  14%|█▍        | 1466/10399 [16:11<1:37:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11865, grad_norm=0.0206, lr=0.0002, loss=0.197]
Train Epoch #2:  14%|█▍        | 1482/10399 [16:22<1:36:59,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11881, grad_norm=0.0178, lr=0.0002, loss=0.197]
Train Epoch #2:  14%|█▍        | 1498/10399 [16:32<1:36:49,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11897, grad_norm=0.0175, lr=0.0002, loss=0.197]
Train Epoch #2:  15%|█▍        | 1514/10399 [16:43<1:36:37,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11913, grad_norm=0.0192, lr=0.0002, loss=0.196]
Train Epoch #2:  15%|█▍        | 1530/10399 [16:53<1:36:26,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11929, grad_norm=0.018, lr=0.0002, loss=0.196] 
Train Epoch #2:  15%|█▍        | 1546/10399 [17:03<1:36:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11945, grad_norm=0.0184, lr=0.0002, loss=0.196]
Train Epoch #2:  15%|█▌        | 1562/10399 [17:14<1:36:05,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11961, grad_norm=0.0176, lr=0.0002, loss=0.196]
Train Epoch #2:  15%|█▌        | 1578/10399 [17:24<1:35:54,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11977, grad_norm=0.0181, lr=0.0002, loss=0.196]
Train Epoch #2:  15%|█▌        | 1594/10399 [17:35<1:35:44,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11993, grad_norm=0.018, lr=0.0002, loss=0.196] 
Train Epoch #2:  15%|█▌        | 1601/10399 [17:47<1:35:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12000, grad_norm=0.0201, lr=0.0002, loss=0.197]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 12000

Train Epoch #2:  15%|█▌        | 1602/10399 [17:54<2:20:08,  1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12001, grad_norm=0.0178, lr=0.0002, loss=0.197]
Train Epoch #2:  16%|█▌        | 1618/10399 [18:04<2:04:55,  1.17it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12017, grad_norm=0.0184, lr=0.0002, loss=0.197]
Train Epoch #2:  16%|█▌        | 1634/10399 [18:15<1:55:11,  1.27it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12033, grad_norm=0.0152, lr=0.0002, loss=0.197]
Train Epoch #2:  16%|█▌        | 1650/10399 [18:25<1:48:42,  1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12049, grad_norm=0.0186, lr=0.0002, loss=0.197]
Train Epoch #2:  16%|█▌        | 1666/10399 [18:36<1:44:19,  1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12065, grad_norm=0.0179, lr=0.0002, loss=0.196]
Train Epoch #2:  16%|█▌        | 1682/10399 [18:46<1:41:15,  1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12081, grad_norm=0.0192, lr=0.0002, loss=0.197]
Train Epoch #2:  16%|█▋        | 1698/10399 [18:57<1:39:31,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12097, grad_norm=0.0212, lr=0.0002, loss=0.197]
Train Epoch #2:  16%|█▋        | 1713/10399 [19:07<1:39:21,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12112, grad_norm=0.0167, lr=0.0002, loss=0.197]
Train Epoch #2:  16%|█▋        | 1714/10399 [19:07<1:37:51,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12113, grad_norm=0.0165, lr=0.0002, loss=0.197]
Train Epoch #2:  17%|█▋        | 1730/10399 [19:17<1:36:39,  1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12129, grad_norm=0.0179, lr=0.0002, loss=0.197]
Train Epoch #2:  17%|█▋        | 1746/10399 [19:28<1:35:46,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12145, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2:  17%|█▋        | 1762/10399 [19:38<1:35:07,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12161, grad_norm=0.021, lr=0.0002, loss=0.197] 
Train Epoch #2:  17%|█▋        | 1778/10399 [19:49<1:34:35,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12177, grad_norm=0.0166, lr=0.0002, loss=0.197]
Train Epoch #2:  17%|█▋        | 1794/10399 [19:59<1:34:10,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12193, grad_norm=0.0188, lr=0.0002, loss=0.197]
Train Epoch #2:  17%|█▋        | 1810/10399 [20:10<1:33:49,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12209, grad_norm=0.0201, lr=0.0002, loss=0.197]
Train Epoch #2:  18%|█▊        | 1826/10399 [20:20<1:33:31,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12225, grad_norm=0.02, lr=0.0002, loss=0.197]  
Train Epoch #2:  18%|█▊        | 1842/10399 [20:31<1:33:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12241, grad_norm=0.0181, lr=0.0002, loss=0.197]
Train Epoch #2:  18%|█▊        | 1858/10399 [20:41<1:33:02,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12257, grad_norm=0.0183, lr=0.0002, loss=0.197]
Train Epoch #2:  18%|█▊        | 1873/10399 [20:51<1:32:52,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12272, grad_norm=0.0192, lr=0.0002, loss=0.197]
Train Epoch #2:  18%|█▊        | 1874/10399 [20:51<1:32:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12273, grad_norm=0.0172, lr=0.0002, loss=0.197]
Train Epoch #2:  18%|█▊        | 1890/10399 [21:02<1:32:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12289, grad_norm=0.0187, lr=0.0002, loss=0.197]
Train Epoch #2:  18%|█▊        | 1906/10399 [21:12<1:32:27,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12305, grad_norm=0.0172, lr=0.0002, loss=0.197]
Train Epoch #2:  18%|█▊        | 1922/10399 [21:23<1:32:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12321, grad_norm=0.0209, lr=0.0002, loss=0.197]
Train Epoch #2:  19%|█▊        | 1938/10399 [21:33<1:32:04,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12337, grad_norm=0.0195, lr=0.0002, loss=0.197]
Train Epoch #2:  19%|█▉        | 1954/10399 [21:44<1:31:55,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12353, grad_norm=0.0172, lr=0.0002, loss=0.197]
Train Epoch #2:  19%|█▉        | 1970/10399 [21:54<1:31:45,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12369, grad_norm=0.0184, lr=0.0002, loss=0.197]
Train Epoch #2:  19%|█▉        | 1986/10399 [22:05<1:31:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12385, grad_norm=0.0192, lr=0.0002, loss=0.197]
Train Epoch #2:  19%|█▉        | 2002/10399 [22:15<1:31:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12401, grad_norm=0.0189, lr=0.0002, loss=0.197]
Train Epoch #2:  19%|█▉        | 2018/10399 [22:25<1:31:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12417, grad_norm=0.0185, lr=0.0002, loss=0.197]
Train Epoch #2:  20%|█▉        | 2034/10399 [22:36<1:31:02,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12433, grad_norm=0.02, lr=0.0002, loss=0.197]  
Train Epoch #2:  20%|█▉        | 2050/10399 [22:46<1:30:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12449, grad_norm=0.0192, lr=0.0002, loss=0.197]
Train Epoch #2:  20%|█▉        | 2065/10399 [22:57<1:30:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12464, grad_norm=0.0212, lr=0.0002, loss=0.197]
Train Epoch #2:  20%|█▉        | 2066/10399 [22:57<1:31:01,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12465, grad_norm=0.0152, lr=0.0002, loss=0.197]
Train Epoch #2:  20%|██        | 2082/10399 [23:07<1:30:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12481, grad_norm=0.0196, lr=0.0002, loss=0.197]
Train Epoch #2:  20%|██        | 2098/10399 [23:18<1:30:26,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12497, grad_norm=0.0157, lr=0.0002, loss=0.197]
Train Epoch #2:  20%|██        | 2114/10399 [23:28<1:30:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12513, grad_norm=0.0178, lr=0.0002, loss=0.197]
Train Epoch #2:  20%|██        | 2130/10399 [23:39<1:30:02,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12529, grad_norm=0.0173, lr=0.0002, loss=0.197]
Train Epoch #2:  21%|██        | 2146/10399 [23:49<1:29:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12545, grad_norm=0.0175, lr=0.0002, loss=0.197]
Train Epoch #2:  21%|██        | 2162/10399 [24:00<1:29:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12561, grad_norm=0.0167, lr=0.0002, loss=0.197]
Train Epoch #2:  21%|██        | 2178/10399 [24:10<1:29:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12577, grad_norm=0.0173, lr=0.0002, loss=0.197]
Train Epoch #2:  21%|██        | 2194/10399 [24:20<1:29:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12593, grad_norm=0.0177, lr=0.0002, loss=0.197]
Train Epoch #2:  21%|██▏       | 2210/10399 [24:31<1:29:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12609, grad_norm=0.0159, lr=0.0002, loss=0.197]
Train Epoch #2:  21%|██▏       | 2225/10399 [24:41<1:29:00,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12624, grad_norm=0.0187, lr=0.0002, loss=0.197]
Train Epoch #2:  21%|██▏       | 2226/10399 [24:41<1:28:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12625, grad_norm=0.0168, lr=0.0002, loss=0.197]
Train Epoch #2:  22%|██▏       | 2242/10399 [24:52<1:28:47,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12641, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2:  22%|██▏       | 2258/10399 [25:02<1:28:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12657, grad_norm=0.0179, lr=0.0002, loss=0.196]
Train Epoch #2:  22%|██▏       | 2274/10399 [25:13<1:28:26,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12673, grad_norm=0.0176, lr=0.0002, loss=0.196]
Train Epoch #2:  22%|██▏       | 2290/10399 [25:23<1:28:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12689, grad_norm=0.0182, lr=0.0002, loss=0.196]
Train Epoch #2:  22%|██▏       | 2306/10399 [25:34<1:28:06,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12705, grad_norm=0.0172, lr=0.0002, loss=0.196]
Train Epoch #2:  22%|██▏       | 2322/10399 [25:44<1:27:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12721, grad_norm=0.0183, lr=0.0002, loss=0.196]
Train Epoch #2:  22%|██▏       | 2338/10399 [25:55<1:27:44,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12737, grad_norm=0.0177, lr=0.0002, loss=0.196]
Train Epoch #2:  23%|██▎       | 2354/10399 [26:05<1:27:33,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12753, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2:  23%|██▎       | 2370/10399 [26:15<1:27:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12769, grad_norm=0.0199, lr=0.0002, loss=0.196]
Train Epoch #2:  23%|██▎       | 2386/10399 [26:26<1:27:10,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12785, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2:  23%|██▎       | 2402/10399 [26:36<1:26:59,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12801, grad_norm=0.0216, lr=0.0002, loss=0.196]
Train Epoch #2:  23%|██▎       | 2418/10399 [26:47<1:26:49,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12817, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2:  23%|██▎       | 2433/10399 [26:57<1:26:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12832, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2:  23%|██▎       | 2434/10399 [26:57<1:26:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12833, grad_norm=0.0183, lr=0.0002, loss=0.196]
Train Epoch #2:  24%|██▎       | 2450/10399 [27:08<1:26:29,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12849, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2:  24%|██▎       | 2466/10399 [27:18<1:26:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12865, grad_norm=0.0169, lr=0.0002, loss=0.196]
Train Epoch #2:  24%|██▍       | 2482/10399 [27:29<1:26:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12881, grad_norm=0.0173, lr=0.0002, loss=0.196]
Train Epoch #2:  24%|██▍       | 2498/10399 [27:39<1:26:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12897, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2:  24%|██▍       | 2514/10399 [27:50<1:26:02,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12913, grad_norm=0.017, lr=0.0002, loss=0.196] 
Train Epoch #2:  24%|██▍       | 2530/10399 [28:00<1:25:47,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12929, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2:  24%|██▍       | 2546/10399 [28:10<1:25:34,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12945, grad_norm=0.0185, lr=0.0002, loss=0.196]
Train Epoch #2:  25%|██▍       | 2562/10399 [28:21<1:25:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12961, grad_norm=0.0168, lr=0.0002, loss=0.196]
Train Epoch #2:  25%|██▍       | 2577/10399 [28:31<1:25:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12976, grad_norm=0.0178, lr=0.0002, loss=0.196]
Train Epoch #2:  25%|██▍       | 2578/10399 [28:31<1:25:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12977, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2:  25%|██▍       | 2594/10399 [28:42<1:24:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12993, grad_norm=0.0169, lr=0.0002, loss=0.196]
Train Epoch #2:  25%|██▌       | 2601/10399 [28:57<1:24:52,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13000, grad_norm=0.0178, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 13000

Train Epoch #2:  25%|██▌       | 2602/10399 [29:01<2:04:09,  1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13001, grad_norm=0.0171, lr=0.0002, loss=0.196]
Train Epoch #2:  25%|██▌       | 2617/10399 [29:11<2:03:54,  1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13016, grad_norm=0.0176, lr=0.0002, loss=0.196]
Train Epoch #2:  25%|██▌       | 2618/10399 [29:11<1:50:40,  1.17it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13017, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2:  25%|██▌       | 2634/10399 [29:22<1:42:00,  1.27it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13033, grad_norm=0.017, lr=0.0002, loss=0.196] 
Train Epoch #2:  25%|██▌       | 2650/10399 [29:32<1:36:15,  1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13049, grad_norm=0.0162, lr=0.0002, loss=0.196]
Train Epoch #2:  26%|██▌       | 2666/10399 [29:42<1:32:21,  1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13065, grad_norm=0.017, lr=0.0002, loss=0.196] 
Train Epoch #2:  26%|██▌       | 2682/10399 [29:53<1:29:39,  1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13081, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2:  26%|██▌       | 2698/10399 [30:03<1:27:43,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13097, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2:  26%|██▌       | 2714/10399 [30:14<1:26:19,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13113, grad_norm=0.016, lr=0.0002, loss=0.196] 
Train Epoch #2:  26%|██▋       | 2730/10399 [30:24<1:25:20,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13129, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2:  26%|██▋       | 2746/10399 [30:35<1:24:35,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13145, grad_norm=0.0159, lr=0.0002, loss=0.196]
Train Epoch #2:  27%|██▋       | 2762/10399 [30:45<1:24:00,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13161, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2:  27%|██▋       | 2778/10399 [30:56<1:23:34,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13177, grad_norm=0.0172, lr=0.0002, loss=0.196]
Train Epoch #2:  27%|██▋       | 2794/10399 [31:06<1:23:12,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13193, grad_norm=0.0188, lr=0.0002, loss=0.197]
Train Epoch #2:  27%|██▋       | 2810/10399 [31:16<1:22:54,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13209, grad_norm=0.0188, lr=0.0002, loss=0.197]
Train Epoch #2:  27%|██▋       | 2826/10399 [31:27<1:22:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13225, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2:  27%|██▋       | 2841/10399 [31:37<1:22:28,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13240, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2:  27%|██▋       | 2842/10399 [31:37<1:22:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13241, grad_norm=0.0176, lr=0.0002, loss=0.196]
Train Epoch #2:  27%|██▋       | 2858/10399 [31:48<1:22:10,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13257, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2:  28%|██▊       | 2874/10399 [31:58<1:21:57,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13273, grad_norm=0.0146, lr=0.0002, loss=0.196]
Train Epoch #2:  28%|██▊       | 2890/10399 [32:09<1:22:06,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13289, grad_norm=0.0175, lr=0.0002, loss=0.196]
Train Epoch #2:  28%|██▊       | 2906/10399 [32:19<1:21:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13305, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2:  28%|██▊       | 2922/10399 [32:30<1:21:33,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13321, grad_norm=0.0174, lr=0.0002, loss=0.196]
Train Epoch #2:  28%|██▊       | 2938/10399 [32:40<1:21:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13337, grad_norm=0.0162, lr=0.0002, loss=0.196]
Train Epoch #2:  28%|██▊       | 2954/10399 [32:51<1:21:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13353, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2:  29%|██▊       | 2969/10399 [33:01<1:20:59,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13368, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2:  29%|██▊       | 2970/10399 [33:01<1:20:55,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13369, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2:  29%|██▊       | 2986/10399 [33:12<1:20:44,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13385, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2:  29%|██▉       | 3002/10399 [33:22<1:20:33,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13401, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2:  29%|██▉       | 3018/10399 [33:32<1:20:20,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13417, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2:  29%|██▉       | 3034/10399 [33:43<1:20:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13433, grad_norm=0.015, lr=0.0002, loss=0.196] 
Train Epoch #2:  29%|██▉       | 3050/10399 [33:53<1:19:57,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13449, grad_norm=0.0163, lr=0.0002, loss=0.196]
Train Epoch #2:  29%|██▉       | 3066/10399 [34:04<1:19:47,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13465, grad_norm=0.0173, lr=0.0002, loss=0.196]
Train Epoch #2:  30%|██▉       | 3082/10399 [34:14<1:19:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13481, grad_norm=0.0183, lr=0.0002, loss=0.196]
Train Epoch #2:  30%|██▉       | 3098/10399 [34:25<1:19:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13497, grad_norm=0.0174, lr=0.0002, loss=0.196]
Train Epoch #2:  30%|██▉       | 3114/10399 [34:35<1:19:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13513, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2:  30%|███       | 3130/10399 [34:46<1:19:04,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13529, grad_norm=0.0159, lr=0.0002, loss=0.196]
Train Epoch #2:  30%|███       | 3146/10399 [34:56<1:18:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13545, grad_norm=0.0164, lr=0.0002, loss=0.196]
Train Epoch #2:  30%|███       | 3162/10399 [35:06<1:18:44,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13561, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2:  31%|███       | 3178/10399 [35:17<1:18:34,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13577, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  31%|███       | 3193/10399 [35:27<1:18:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13592, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  31%|███       | 3194/10399 [35:27<1:18:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13593, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  31%|███       | 3210/10399 [35:38<1:18:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13609, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2:  31%|███       | 3226/10399 [35:48<1:18:05,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13625, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2:  31%|███       | 3242/10399 [35:59<1:18:12,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13641, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2:  31%|███▏      | 3258/10399 [36:09<1:17:55,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13657, grad_norm=0.017, lr=0.0002, loss=0.196] 
Train Epoch #2:  31%|███▏      | 3274/10399 [36:20<1:17:41,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13673, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2:  32%|███▏      | 3290/10399 [36:30<1:17:29,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13689, grad_norm=0.016, lr=0.0002, loss=0.196] 
Train Epoch #2:  32%|███▏      | 3306/10399 [36:41<1:17:18,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13705, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  32%|███▏      | 3321/10399 [36:51<1:17:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13720, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2:  32%|███▏      | 3322/10399 [36:51<1:17:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13721, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  32%|███▏      | 3338/10399 [37:02<1:16:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13737, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2:  32%|███▏      | 3354/10399 [37:12<1:16:44,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13753, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2:  32%|███▏      | 3370/10399 [37:22<1:16:33,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13769, grad_norm=0.015, lr=0.0002, loss=0.196] 
Train Epoch #2:  33%|███▎      | 3386/10399 [37:33<1:16:22,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13785, grad_norm=0.0169, lr=0.0002, loss=0.196]
Train Epoch #2:  33%|███▎      | 3402/10399 [37:43<1:16:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13801, grad_norm=0.0184, lr=0.0002, loss=0.196]
Train Epoch #2:  33%|███▎      | 3418/10399 [37:54<1:15:57,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13817, grad_norm=0.0178, lr=0.0002, loss=0.196]
Train Epoch #2:  33%|███▎      | 3434/10399 [38:04<1:15:46,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13833, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2:  33%|███▎      | 3450/10399 [38:15<1:15:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13849, grad_norm=0.0181, lr=0.0002, loss=0.196]
Train Epoch #2:  33%|███▎      | 3466/10399 [38:25<1:15:26,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13865, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  33%|███▎      | 3482/10399 [38:36<1:15:17,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13881, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2:  34%|███▎      | 3498/10399 [38:46<1:15:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13897, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2:  34%|███▍      | 3514/10399 [38:56<1:14:57,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13913, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2:  34%|███▍      | 3530/10399 [39:07<1:14:46,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13929, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  34%|███▍      | 3545/10399 [39:17<1:14:37,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13944, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2:  34%|███▍      | 3546/10399 [39:17<1:14:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13945, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2:  34%|███▍      | 3562/10399 [39:28<1:14:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13961, grad_norm=0.015, lr=0.0002, loss=0.196] 
Train Epoch #2:  34%|███▍      | 3578/10399 [39:38<1:14:14,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13977, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  35%|███▍      | 3594/10399 [39:49<1:14:02,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13993, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2:  35%|███▍      | 3601/10399 [40:01<1:13:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14000, grad_norm=0.0145, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 14000

Train Epoch #2:  35%|███▍      | 3602/10399 [40:08<1:48:24,  1.04it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14001, grad_norm=0.0157, lr=0.0002, loss=0.196]
Train Epoch #2:  35%|███▍      | 3618/10399 [40:18<1:36:35,  1.17it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14017, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2:  35%|███▍      | 3634/10399 [40:29<1:29:18,  1.26it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14033, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2:  35%|███▌      | 3650/10399 [40:39<1:24:07,  1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14049, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  35%|███▌      | 3666/10399 [40:50<1:20:35,  1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14065, grad_norm=0.015, lr=0.0002, loss=0.196] 
Train Epoch #2:  35%|███▌      | 3682/10399 [41:00<1:18:10,  1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14081, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2:  36%|███▌      | 3698/10399 [41:11<1:16:26,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14097, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2:  36%|███▌      | 3714/10399 [41:21<1:15:11,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14113, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  36%|███▌      | 3729/10399 [41:31<1:15:01,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14128, grad_norm=0.0163, lr=0.0002, loss=0.196]
Train Epoch #2:  36%|███▌      | 3730/10399 [41:31<1:14:16,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14129, grad_norm=0.0173, lr=0.0002, loss=0.196]
Train Epoch #2:  36%|███▌      | 3746/10399 [41:42<1:13:35,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14145, grad_norm=0.0142, lr=0.0002, loss=0.196]
Train Epoch #2:  36%|███▌      | 3762/10399 [41:52<1:13:02,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14161, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  36%|███▋      | 3778/10399 [42:03<1:12:37,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14177, grad_norm=0.0142, lr=0.0002, loss=0.196]
Train Epoch #2:  36%|███▋      | 3794/10399 [42:13<1:12:16,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14193, grad_norm=0.0163, lr=0.0002, loss=0.196]
Train Epoch #2:  37%|███▋      | 3810/10399 [42:24<1:11:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14209, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  37%|███▋      | 3826/10399 [42:34<1:11:43,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14225, grad_norm=0.014, lr=0.0002, loss=0.196] 
Train Epoch #2:  37%|███▋      | 3842/10399 [42:45<1:11:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14241, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  37%|███▋      | 3858/10399 [42:55<1:11:17,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14257, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2:  37%|███▋      | 3874/10399 [43:05<1:11:04,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14273, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2:  37%|███▋      | 3890/10399 [43:16<1:10:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14289, grad_norm=0.0157, lr=0.0002, loss=0.196]
Train Epoch #2:  38%|███▊      | 3906/10399 [43:26<1:10:41,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14305, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2:  38%|███▊      | 3922/10399 [43:37<1:10:31,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14321, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2:  38%|███▊      | 3937/10399 [43:47<1:10:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14336, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2:  38%|███▊      | 3938/10399 [43:47<1:10:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14337, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  38%|███▊      | 3954/10399 [43:58<1:10:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14353, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  38%|███▊      | 3970/10399 [44:08<1:09:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14369, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2:  38%|███▊      | 3986/10399 [44:19<1:09:47,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14385, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  38%|███▊      | 4002/10399 [44:29<1:09:37,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14401, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  39%|███▊      | 4018/10399 [44:39<1:09:26,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14417, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2:  39%|███▉      | 4034/10399 [44:50<1:09:16,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14433, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2:  39%|███▉      | 4050/10399 [45:01<1:09:22,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14449, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2:  39%|███▉      | 4066/10399 [45:11<1:09:06,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14465, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2:  39%|███▉      | 4081/10399 [45:21<1:08:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14480, grad_norm=0.016, lr=0.0002, loss=0.196] 
Train Epoch #2:  39%|███▉      | 4082/10399 [45:21<1:08:52,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14481, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2:  39%|███▉      | 4098/10399 [45:32<1:08:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14497, grad_norm=0.0149, lr=0.0002, loss=0.196]
Train Epoch #2:  40%|███▉      | 4114/10399 [45:42<1:08:28,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14513, grad_norm=0.0146, lr=0.0002, loss=0.196]
Train Epoch #2:  40%|███▉      | 4130/10399 [45:53<1:08:17,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14529, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  40%|███▉      | 4146/10399 [46:03<1:08:06,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14545, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  40%|████      | 4162/10399 [46:14<1:07:54,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14561, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2:  40%|████      | 4178/10399 [46:24<1:07:43,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14577, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  40%|████      | 4194/10399 [46:35<1:07:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14593, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  40%|████      | 4210/10399 [46:45<1:07:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14609, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2:  41%|████      | 4226/10399 [46:55<1:07:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14625, grad_norm=0.0172, lr=0.0002, loss=0.196]
Train Epoch #2:  41%|████      | 4242/10399 [47:06<1:07:00,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14641, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2:  41%|████      | 4258/10399 [47:16<1:06:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14657, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  41%|████      | 4274/10399 [47:27<1:06:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14673, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  41%|████      | 4289/10399 [47:37<1:06:29,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14688, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  41%|████▏     | 4290/10399 [47:37<1:06:28,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14689, grad_norm=0.014, lr=0.0002, loss=0.196] 
Train Epoch #2:  41%|████▏     | 4306/10399 [47:48<1:06:17,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14705, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2:  42%|████▏     | 4322/10399 [47:58<1:06:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14721, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2:  42%|████▏     | 4338/10399 [48:09<1:05:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14737, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2:  42%|████▏     | 4354/10399 [48:19<1:05:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14753, grad_norm=0.016, lr=0.0002, loss=0.196] 
Train Epoch #2:  42%|████▏     | 4370/10399 [48:29<1:05:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14769, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  42%|████▏     | 4386/10399 [48:40<1:05:27,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14785, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  42%|████▏     | 4402/10399 [48:50<1:05:16,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14801, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2:  42%|████▏     | 4418/10399 [49:01<1:05:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14817, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2:  43%|████▎     | 4433/10399 [49:11<1:05:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14832, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2:  43%|████▎     | 4434/10399 [49:11<1:05:05,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14833, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  43%|████▎     | 4450/10399 [49:22<1:04:52,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14849, grad_norm=0.0174, lr=0.0002, loss=0.196]
Train Epoch #2:  43%|████▎     | 4466/10399 [49:32<1:04:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14865, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2:  43%|████▎     | 4482/10399 [49:43<1:04:27,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14881, grad_norm=0.0142, lr=0.0002, loss=0.196]
Train Epoch #2:  43%|████▎     | 4498/10399 [49:53<1:04:16,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14897, grad_norm=0.014, lr=0.0002, loss=0.196] 
Train Epoch #2:  43%|████▎     | 4514/10399 [50:04<1:04:04,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14913, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2:  44%|████▎     | 4530/10399 [50:14<1:03:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14929, grad_norm=0.015, lr=0.0002, loss=0.196] 
Train Epoch #2:  44%|████▎     | 4546/10399 [50:25<1:03:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14945, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2:  44%|████▍     | 4562/10399 [50:35<1:03:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14961, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2:  44%|████▍     | 4578/10399 [50:45<1:03:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14977, grad_norm=0.014, lr=0.0002, loss=0.196] 
Train Epoch #2:  44%|████▍     | 4594/10399 [50:56<1:03:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14993, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  44%|████▍     | 4601/10399 [51:07<1:03:06,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15000, grad_norm=0.0133, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 15000

Train Epoch #2:  44%|████▍     | 4602/10399 [51:16<1:34:30,  1.02it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15001, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  44%|████▍     | 4618/10399 [51:26<1:23:42,  1.15it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15017, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  45%|████▍     | 4634/10399 [51:37<1:16:44,  1.25it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15033, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2:  45%|████▍     | 4649/10399 [51:47<1:16:32,  1.25it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15048, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2:  45%|████▍     | 4650/10399 [51:47<1:12:06,  1.33it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15049, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2:  45%|████▍     | 4666/10399 [51:58<1:08:54,  1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15065, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  45%|████▌     | 4682/10399 [52:08<1:06:42,  1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15081, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  45%|████▌     | 4698/10399 [52:18<1:05:08,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15097, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  45%|████▌     | 4714/10399 [52:29<1:04:01,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15113, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  45%|████▌     | 4730/10399 [52:39<1:03:11,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15129, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2:  46%|████▌     | 4746/10399 [52:50<1:02:33,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15145, grad_norm=0.015, lr=0.0002, loss=0.196] 
Train Epoch #2:  46%|████▌     | 4762/10399 [53:00<1:02:04,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15161, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2:  46%|████▌     | 4778/10399 [53:11<1:01:40,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15177, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  46%|████▌     | 4793/10399 [53:21<1:01:30,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15192, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2:  46%|████▌     | 4794/10399 [53:21<1:01:36,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15193, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  46%|████▋     | 4810/10399 [53:32<1:01:14,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15209, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2:  46%|████▋     | 4826/10399 [53:42<1:00:54,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15225, grad_norm=0.0142, lr=0.0002, loss=0.196]
Train Epoch #2:  47%|████▋     | 4842/10399 [53:53<1:00:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15241, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  47%|████▋     | 4858/10399 [54:03<1:00:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15257, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  47%|████▋     | 4874/10399 [54:14<1:00:12,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15273, grad_norm=0.013, lr=0.0002, loss=0.196] 
Train Epoch #2:  47%|████▋     | 4890/10399 [54:24<1:00:00,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15289, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  47%|████▋     | 4906/10399 [54:34<59:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15305, grad_norm=0.0142, lr=0.0002, loss=0.196]  
Train Epoch #2:  47%|████▋     | 4922/10399 [54:45<59:37,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15321, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2:  47%|████▋     | 4938/10399 [54:55<59:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15337, grad_norm=0.0131, lr=0.0002, loss=0.196]
Train Epoch #2:  48%|████▊     | 4954/10399 [55:06<59:14,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15353, grad_norm=0.0149, lr=0.0002, loss=0.196]
Train Epoch #2:  48%|████▊     | 4970/10399 [55:16<59:03,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15369, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2:  48%|████▊     | 4986/10399 [55:27<58:52,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15385, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  48%|████▊     | 5002/10399 [55:37<58:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15401, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2:  48%|████▊     | 5017/10399 [55:47<58:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15416, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2:  48%|████▊     | 5018/10399 [55:48<58:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15417, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2:  48%|████▊     | 5034/10399 [55:58<58:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15433, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  49%|████▊     | 5050/10399 [56:08<58:10,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15449, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  49%|████▊     | 5066/10399 [56:19<58:00,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15465, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  49%|████▉     | 5082/10399 [56:29<57:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15481, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2:  49%|████▉     | 5098/10399 [56:40<57:41,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15497, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2:  49%|████▉     | 5114/10399 [56:50<57:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15513, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2:  49%|████▉     | 5130/10399 [57:01<57:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15529, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  49%|████▉     | 5146/10399 [57:11<57:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15545, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2:  50%|████▉     | 5161/10399 [57:21<56:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15560, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  50%|████▉     | 5162/10399 [57:21<56:57,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15561, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2:  50%|████▉     | 5178/10399 [57:32<56:47,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15577, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2:  50%|████▉     | 5194/10399 [57:42<56:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15593, grad_norm=0.014, lr=0.0002, loss=0.196] 
Train Epoch #2:  50%|█████     | 5210/10399 [57:53<56:41,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15609, grad_norm=0.013, lr=0.0002, loss=0.196]
Train Epoch #2:  50%|█████     | 5226/10399 [58:03<56:27,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15625, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2:  50%|█████     | 5242/10399 [58:14<56:13,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15641, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  51%|█████     | 5258/10399 [58:24<56:01,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15657, grad_norm=0.013, lr=0.0002, loss=0.196] 
Train Epoch #2:  51%|█████     | 5274/10399 [58:35<55:49,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15673, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2:  51%|█████     | 5290/10399 [58:45<55:37,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15689, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2:  51%|█████     | 5306/10399 [58:56<55:26,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15705, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2:  51%|█████     | 5322/10399 [59:06<55:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15721, grad_norm=0.0169, lr=0.0002, loss=0.196]
Train Epoch #2:  51%|█████▏    | 5338/10399 [59:17<55:04,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15737, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2:  51%|█████▏    | 5354/10399 [59:27<54:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15753, grad_norm=0.0131, lr=0.0002, loss=0.196]
Train Epoch #2:  52%|█████▏    | 5369/10399 [59:37<54:43,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15768, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  52%|█████▏    | 5370/10399 [59:37<54:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15769, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2:  52%|█████▏    | 5386/10399 [59:48<54:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15785, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  52%|█████▏    | 5402/10399 [59:58<54:22,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15801, grad_norm=0.0162, lr=0.0002, loss=0.196]
Train Epoch #2:  52%|█████▏    | 5418/10399 [1:00:09<54:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15817, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  52%|█████▏    | 5434/10399 [1:00:19<53:59,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15833, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2:  52%|█████▏    | 5450/10399 [1:00:30<53:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15849, grad_norm=0.016, lr=0.0002, loss=0.196] 
Train Epoch #2:  53%|█████▎    | 5466/10399 [1:00:40<53:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15865, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  53%|█████▎    | 5482/10399 [1:00:51<53:29,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15881, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2:  53%|█████▎    | 5498/10399 [1:01:01<53:18,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15897, grad_norm=0.0124, lr=0.0002, loss=0.196]
Train Epoch #2:  53%|█████▎    | 5513/10399 [1:01:11<53:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15912, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2:  53%|█████▎    | 5514/10399 [1:01:11<53:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15913, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  53%|█████▎    | 5530/10399 [1:01:22<52:57,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15929, grad_norm=0.0122, lr=0.0002, loss=0.196]
Train Epoch #2:  53%|█████▎    | 5546/10399 [1:01:32<52:47,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15945, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2:  53%|█████▎    | 5562/10399 [1:01:43<52:37,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15961, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2:  54%|█████▎    | 5578/10399 [1:01:53<52:41,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15977, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2:  54%|█████▍    | 5594/10399 [1:02:04<52:26,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15993, grad_norm=0.013, lr=0.0002, loss=0.196] 
Train Epoch #2:  54%|█████▍    | 5601/10399 [1:02:17<52:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16000, grad_norm=0.0135, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 16000

Train Epoch #2:  54%|█████▍    | 5602/10399 [1:02:23<1:17:11,  1.04it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16001, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2:  54%|█████▍    | 5618/10399 [1:02:34<1:08:31,  1.16it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16017, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  54%|█████▍    | 5634/10399 [1:02:44<1:02:57,  1.26it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16033, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  54%|█████▍    | 5650/10399 [1:02:54<59:14,  1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16049, grad_norm=0.0126, lr=0.0002, loss=0.196]  
Train Epoch #2:  54%|█████▍    | 5666/10399 [1:03:05<56:41,  1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16065, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  55%|█████▍    | 5682/10399 [1:03:15<54:53,  1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16081, grad_norm=0.0146, lr=0.0002, loss=0.196]
Train Epoch #2:  55%|█████▍    | 5698/10399 [1:03:26<53:36,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16097, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2:  55%|█████▍    | 5714/10399 [1:03:36<52:41,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16113, grad_norm=0.0125, lr=0.0002, loss=0.196]
Train Epoch #2:  55%|█████▌    | 5730/10399 [1:03:47<51:59,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16129, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2:  55%|█████▌    | 5746/10399 [1:03:57<51:27,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16145, grad_norm=0.0127, lr=0.0002, loss=0.196]
Train Epoch #2:  55%|█████▌    | 5761/10399 [1:04:07<51:17,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16160, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2:  55%|█████▌    | 5762/10399 [1:04:07<51:00,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16161, grad_norm=0.0146, lr=0.0002, loss=0.196]
Train Epoch #2:  56%|█████▌    | 5778/10399 [1:04:18<50:40,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16177, grad_norm=0.017, lr=0.0002, loss=0.196] 
Train Epoch #2:  56%|█████▌    | 5794/10399 [1:04:28<50:23,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16193, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2:  56%|█████▌    | 5810/10399 [1:04:39<50:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16209, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2:  56%|█████▌    | 5826/10399 [1:04:49<49:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16225, grad_norm=0.016, lr=0.0002, loss=0.196] 
Train Epoch #2:  56%|█████▌    | 5842/10399 [1:05:00<49:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16241, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2:  56%|█████▋    | 5858/10399 [1:05:10<49:27,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16257, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2:  56%|█████▋    | 5874/10399 [1:05:21<49:16,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16273, grad_norm=0.013, lr=0.0002, loss=0.196] 
Train Epoch #2:  57%|█████▋    | 5890/10399 [1:05:31<49:05,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16289, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  57%|█████▋    | 5905/10399 [1:05:41<48:55,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16304, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2:  57%|█████▋    | 5906/10399 [1:05:41<48:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16305, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2:  57%|█████▋    | 5922/10399 [1:05:52<48:54,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16321, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2:  57%|█████▋    | 5938/10399 [1:06:03<48:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16337, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2:  57%|█████▋    | 5954/10399 [1:06:13<48:27,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16353, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2:  57%|█████▋    | 5970/10399 [1:06:23<48:14,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16369, grad_norm=0.0126, lr=0.0002, loss=0.196]
Train Epoch #2:  58%|█████▊    | 5986/10399 [1:06:34<48:03,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16385, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2:  58%|█████▊    | 6002/10399 [1:06:44<47:51,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16401, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  58%|█████▊    | 6018/10399 [1:06:55<47:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16417, grad_norm=0.014, lr=0.0002, loss=0.196] 
Train Epoch #2:  58%|█████▊    | 6034/10399 [1:07:05<47:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16433, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2:  58%|█████▊    | 6050/10399 [1:07:16<47:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16449, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2:  58%|█████▊    | 6066/10399 [1:07:26<47:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16465, grad_norm=0.013, lr=0.0002, loss=0.196] 
Train Epoch #2:  58%|█████▊    | 6082/10399 [1:07:37<46:59,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16481, grad_norm=0.0122, lr=0.0002, loss=0.196]
Train Epoch #2:  59%|█████▊    | 6098/10399 [1:07:47<46:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16497, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2:  59%|█████▉    | 6113/10399 [1:07:57<46:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16512, grad_norm=0.014, lr=0.0002, loss=0.196] 
Train Epoch #2:  59%|█████▉    | 6114/10399 [1:07:57<46:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16513, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2:  59%|█████▉    | 6130/10399 [1:08:08<46:27,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16529, grad_norm=0.012, lr=0.0002, loss=0.196] 
Train Epoch #2:  59%|█████▉    | 6146/10399 [1:08:18<46:16,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16545, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2:  59%|█████▉    | 6162/10399 [1:08:29<46:06,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16561, grad_norm=0.0131, lr=0.0002, loss=0.196]
Train Epoch #2:  59%|█████▉    | 6178/10399 [1:08:39<45:55,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16577, grad_norm=0.0123, lr=0.0002, loss=0.196]
Train Epoch #2:  60%|█████▉    | 6194/10399 [1:08:50<45:45,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16593, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2:  60%|█████▉    | 6210/10399 [1:09:00<45:35,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16609, grad_norm=0.0127, lr=0.0002, loss=0.196]
Train Epoch #2:  60%|█████▉    | 6226/10399 [1:09:11<45:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16625, grad_norm=0.0127, lr=0.0002, loss=0.196]
Train Epoch #2:  60%|██████    | 6242/10399 [1:09:21<45:14,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16641, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2:  60%|██████    | 6257/10399 [1:09:31<45:04,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16656, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  60%|██████    | 6258/10399 [1:09:31<45:04,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16657, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2:  60%|██████    | 6274/10399 [1:09:42<44:53,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16673, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2:  60%|██████    | 6290/10399 [1:09:52<44:41,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16689, grad_norm=0.0131, lr=0.0002, loss=0.196]
Train Epoch #2:  61%|██████    | 6306/10399 [1:10:03<44:31,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16705, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  61%|██████    | 6322/10399 [1:10:13<44:20,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16721, grad_norm=0.0133, lr=0.0002, loss=0.196]
Train Epoch #2:  61%|██████    | 6338/10399 [1:10:24<44:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16737, grad_norm=0.016, lr=0.0002, loss=0.196] 
Train Epoch #2:  61%|██████    | 6354/10399 [1:10:34<44:12,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16753, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2:  61%|██████▏   | 6370/10399 [1:10:45<43:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16769, grad_norm=0.0133, lr=0.0002, loss=0.196]
Train Epoch #2:  61%|██████▏   | 6386/10399 [1:10:55<43:46,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16785, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2:  62%|██████▏   | 6402/10399 [1:11:06<43:33,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16801, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2:  62%|██████▏   | 6418/10399 [1:11:16<43:22,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16817, grad_norm=0.013, lr=0.0002, loss=0.196] 
Train Epoch #2:  62%|██████▏   | 6434/10399 [1:11:26<43:10,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16833, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2:  62%|██████▏   | 6450/10399 [1:11:37<42:59,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16849, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2:  62%|██████▏   | 6465/10399 [1:11:47<42:49,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16864, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2:  62%|██████▏   | 6466/10399 [1:11:47<42:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16865, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2:  62%|██████▏   | 6482/10399 [1:11:58<42:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16881, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2:  62%|██████▏   | 6498/10399 [1:12:08<42:28,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16897, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2:  63%|██████▎   | 6514/10399 [1:12:19<42:17,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16913, grad_norm=0.0122, lr=0.0002, loss=0.196]
Train Epoch #2:  63%|██████▎   | 6530/10399 [1:12:29<42:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16929, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2:  63%|██████▎   | 6546/10399 [1:12:40<41:57,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16945, grad_norm=0.0118, lr=0.0002, loss=0.196]
Train Epoch #2:  63%|██████▎   | 6562/10399 [1:12:50<41:47,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16961, grad_norm=0.0126, lr=0.0002, loss=0.196]
Train Epoch #2:  63%|██████▎   | 6578/10399 [1:13:01<41:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16977, grad_norm=0.0121, lr=0.0002, loss=0.196]
Train Epoch #2:  63%|██████▎   | 6594/10399 [1:13:11<41:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16993, grad_norm=0.0133, lr=0.0002, loss=0.196]
Train Epoch #2:  63%|██████▎   | 6601/10399 [1:13:21<41:20,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17000, grad_norm=0.0142, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 17000

Train Epoch #2:  63%|██████▎   | 6602/10399 [1:13:30<1:00:07,  1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17001, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2:  64%|██████▎   | 6618/10399 [1:13:40<53:34,  1.18it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17017, grad_norm=0.0129, lr=0.0002, loss=0.196]  
Train Epoch #2:  64%|██████▍   | 6634/10399 [1:13:51<49:18,  1.27it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17033, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2:  64%|██████▍   | 6650/10399 [1:14:01<46:28,  1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17049, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2:  64%|██████▍   | 6665/10399 [1:14:11<46:17,  1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17064, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2:  64%|██████▍   | 6666/10399 [1:14:11<44:31,  1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17065, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2:  64%|██████▍   | 6682/10399 [1:14:22<43:07,  1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17081, grad_norm=0.0127, lr=0.0002, loss=0.196]
Train Epoch #2:  64%|██████▍   | 6698/10399 [1:14:32<42:07,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17097, grad_norm=0.0133, lr=0.0002, loss=0.196]
Train Epoch #2:  65%|██████▍   | 6714/10399 [1:14:43<41:32,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17113, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2:  65%|██████▍   | 6730/10399 [1:14:53<40:55,  1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17129, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2:  65%|██████▍   | 6746/10399 [1:15:04<40:26,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17145, grad_norm=0.014, lr=0.0002, loss=0.196] 
Train Epoch #2:  65%|██████▌   | 6762/10399 [1:15:14<40:03,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17161, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2:  65%|██████▌   | 6778/10399 [1:15:25<39:43,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17177, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  65%|██████▌   | 6794/10399 [1:15:35<39:24,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17193, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2:  65%|██████▌   | 6810/10399 [1:15:45<39:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17209, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  66%|██████▌   | 6826/10399 [1:15:56<38:54,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17225, grad_norm=0.0146, lr=0.0002, loss=0.195]
Train Epoch #2:  66%|██████▌   | 6842/10399 [1:16:06<38:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17241, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2:  66%|██████▌   | 6858/10399 [1:16:17<38:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17257, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2:  66%|██████▌   | 6873/10399 [1:16:27<38:22,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17272, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2:  66%|██████▌   | 6874/10399 [1:16:27<38:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17273, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2:  66%|██████▋   | 6890/10399 [1:16:38<38:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17289, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2:  66%|██████▋   | 6906/10399 [1:16:48<38:00,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17305, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2:  67%|██████▋   | 6922/10399 [1:16:59<37:47,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17321, grad_norm=0.0141, lr=0.0002, loss=0.195]
Train Epoch #2:  67%|██████▋   | 6938/10399 [1:17:09<37:37,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17337, grad_norm=0.011, lr=0.0002, loss=0.195] 
Train Epoch #2:  67%|██████▋   | 6954/10399 [1:17:19<37:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17353, grad_norm=0.0138, lr=0.0002, loss=0.195]
Train Epoch #2:  67%|██████▋   | 6970/10399 [1:17:30<37:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17369, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2:  67%|██████▋   | 6986/10399 [1:17:40<37:05,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17385, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  67%|██████▋   | 7002/10399 [1:17:51<36:54,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17401, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  67%|██████▋   | 7018/10399 [1:18:01<36:45,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17417, grad_norm=0.0153, lr=0.0002, loss=0.195]
Train Epoch #2:  68%|██████▊   | 7033/10399 [1:18:11<36:35,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17432, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2:  68%|██████▊   | 7034/10399 [1:18:12<36:33,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17433, grad_norm=0.0182, lr=0.0002, loss=0.195]
Train Epoch #2:  68%|██████▊   | 7050/10399 [1:18:22<36:23,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17449, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2:  68%|██████▊   | 7066/10399 [1:18:32<36:11,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17465, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  68%|██████▊   | 7082/10399 [1:18:43<36:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17481, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2:  68%|██████▊   | 7098/10399 [1:18:53<35:54,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17497, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2:  68%|██████▊   | 7114/10399 [1:19:04<35:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17513, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  69%|██████▊   | 7130/10399 [1:19:14<35:30,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17529, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2:  69%|██████▊   | 7146/10399 [1:19:25<35:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17545, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2:  69%|██████▉   | 7162/10399 [1:19:35<35:08,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17561, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2:  69%|██████▉   | 7178/10399 [1:19:45<34:58,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17577, grad_norm=0.0166, lr=0.0002, loss=0.195]
Train Epoch #2:  69%|██████▉   | 7194/10399 [1:19:56<34:46,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17593, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  69%|██████▉   | 7210/10399 [1:20:06<34:35,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17609, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2:  69%|██████▉   | 7226/10399 [1:20:17<34:26,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17625, grad_norm=0.0139, lr=0.0002, loss=0.195]
Train Epoch #2:  70%|██████▉   | 7242/10399 [1:20:27<34:14,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17641, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2:  70%|██████▉   | 7257/10399 [1:20:37<34:04,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17656, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2:  70%|██████▉   | 7258/10399 [1:20:37<34:03,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17657, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  70%|██████▉   | 7274/10399 [1:20:48<33:53,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17673, grad_norm=0.0139, lr=0.0002, loss=0.195]
Train Epoch #2:  70%|███████   | 7290/10399 [1:20:58<33:42,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17689, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  70%|███████   | 7306/10399 [1:21:09<33:31,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17705, grad_norm=0.0129, lr=0.0002, loss=0.195]
Train Epoch #2:  70%|███████   | 7322/10399 [1:21:19<33:21,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17721, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  71%|███████   | 7338/10399 [1:21:30<33:11,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17737, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  71%|███████   | 7354/10399 [1:21:40<33:01,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17753, grad_norm=0.0116, lr=0.0002, loss=0.195]
Train Epoch #2:  71%|███████   | 7370/10399 [1:21:50<32:51,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17769, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2:  71%|███████   | 7386/10399 [1:22:01<32:41,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17785, grad_norm=0.0114, lr=0.0002, loss=0.195]
Train Epoch #2:  71%|███████   | 7401/10399 [1:22:11<32:31,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17800, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2:  71%|███████   | 7402/10399 [1:22:11<32:30,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17801, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2:  71%|███████▏  | 7418/10399 [1:22:22<32:19,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17817, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  71%|███████▏  | 7434/10399 [1:22:32<32:09,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17833, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2:  72%|███████▏  | 7450/10399 [1:22:42<31:59,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17849, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2:  72%|███████▏  | 7466/10399 [1:22:53<31:49,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17865, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2:  72%|███████▏  | 7482/10399 [1:23:03<31:38,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17881, grad_norm=0.0167, lr=0.0002, loss=0.195]
Train Epoch #2:  72%|███████▏  | 7498/10399 [1:23:14<31:27,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17897, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2:  72%|███████▏  | 7514/10399 [1:23:24<31:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17913, grad_norm=0.0121, lr=0.0002, loss=0.195]
Train Epoch #2:  72%|███████▏  | 7530/10399 [1:23:35<31:12,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17929, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2:  73%|███████▎  | 7546/10399 [1:23:45<31:00,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17945, grad_norm=0.0154, lr=0.0002, loss=0.195]
Train Epoch #2:  73%|███████▎  | 7562/10399 [1:23:55<30:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17961, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  73%|███████▎  | 7578/10399 [1:24:06<30:37,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17977, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2:  73%|███████▎  | 7594/10399 [1:24:16<30:27,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17993, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2:  73%|███████▎  | 7601/10399 [1:24:27<30:22,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18000, grad_norm=0.0142, lr=0.0002, loss=0.195]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 18000

Train Epoch #2:  73%|███████▎  | 7602/10399 [1:24:35<44:11,  1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18001, grad_norm=0.0129, lr=0.0002, loss=0.195]
Train Epoch #2:  73%|███████▎  | 7618/10399 [1:24:45<39:17,  1.18it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18017, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2:  73%|███████▎  | 7634/10399 [1:24:56<36:07,  1.28it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18033, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2:  74%|███████▎  | 7650/10399 [1:25:06<33:58,  1.35it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18049, grad_norm=0.0114, lr=0.0002, loss=0.195]
Train Epoch #2:  74%|███████▎  | 7666/10399 [1:25:17<32:30,  1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18065, grad_norm=0.0156, lr=0.0002, loss=0.195]
Train Epoch #2:  74%|███████▍  | 7682/10399 [1:25:27<31:26,  1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18081, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2:  74%|███████▍  | 7697/10399 [1:25:37<31:16,  1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18096, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2:  74%|███████▍  | 7698/10399 [1:25:37<30:40,  1.47it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18097, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  74%|███████▍  | 7714/10399 [1:25:48<30:06,  1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18113, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2:  74%|███████▍  | 7730/10399 [1:25:58<29:38,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18129, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2:  74%|███████▍  | 7746/10399 [1:26:09<29:15,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18145, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2:  75%|███████▍  | 7762/10399 [1:26:19<28:55,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18161, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  75%|███████▍  | 7778/10399 [1:26:30<28:40,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18177, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2:  75%|███████▍  | 7794/10399 [1:26:40<28:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18193, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  75%|███████▌  | 7810/10399 [1:26:50<28:12,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18209, grad_norm=0.0138, lr=0.0002, loss=0.195]
Train Epoch #2:  75%|███████▌  | 7826/10399 [1:27:01<28:00,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18225, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2:  75%|███████▌  | 7842/10399 [1:27:11<27:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18241, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2:  76%|███████▌  | 7857/10399 [1:27:21<27:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18256, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2:  76%|███████▌  | 7858/10399 [1:27:22<27:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18257, grad_norm=0.0144, lr=0.0002, loss=0.195]
Train Epoch #2:  76%|███████▌  | 7874/10399 [1:27:32<27:26,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18273, grad_norm=0.012, lr=0.0002, loss=0.195] 
Train Epoch #2:  76%|███████▌  | 7890/10399 [1:27:42<27:15,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18289, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2:  76%|███████▌  | 7906/10399 [1:27:53<27:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18305, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2:  76%|███████▌  | 7922/10399 [1:28:03<26:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18321, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2:  76%|███████▋  | 7938/10399 [1:28:14<26:46,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18337, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  76%|███████▋  | 7954/10399 [1:28:24<26:34,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18353, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2:  77%|███████▋  | 7970/10399 [1:28:35<26:22,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18369, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  77%|███████▋  | 7986/10399 [1:28:45<26:12,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18385, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2:  77%|███████▋  | 8002/10399 [1:28:56<26:01,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18401, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2:  77%|███████▋  | 8018/10399 [1:29:06<25:51,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18417, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2:  77%|███████▋  | 8034/10399 [1:29:16<25:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18433, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2:  77%|███████▋  | 8050/10399 [1:29:27<25:30,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18449, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2:  78%|███████▊  | 8065/10399 [1:29:37<25:20,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18464, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  78%|███████▊  | 8066/10399 [1:29:37<25:20,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18465, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2:  78%|███████▊  | 8082/10399 [1:29:48<25:09,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18481, grad_norm=0.0115, lr=0.0002, loss=0.195]
Train Epoch #2:  78%|███████▊  | 8098/10399 [1:29:58<24:58,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18497, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2:  78%|███████▊  | 8114/10399 [1:30:09<24:49,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18513, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2:  78%|███████▊  | 8130/10399 [1:30:19<24:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18529, grad_norm=0.0139, lr=0.0002, loss=0.195]
Train Epoch #2:  78%|███████▊  | 8146/10399 [1:30:29<24:28,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18545, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  78%|███████▊  | 8162/10399 [1:30:40<24:17,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18561, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2:  79%|███████▊  | 8178/10399 [1:30:50<24:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18577, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2:  79%|███████▉  | 8194/10399 [1:31:01<23:56,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18593, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2:  79%|███████▉  | 8210/10399 [1:31:11<23:46,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18609, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2:  79%|███████▉  | 8225/10399 [1:31:21<23:36,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18624, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  79%|███████▉  | 8226/10399 [1:31:21<23:35,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18625, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2:  79%|███████▉  | 8242/10399 [1:31:32<23:24,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18641, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2:  79%|███████▉  | 8258/10399 [1:31:42<23:19,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18657, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2:  80%|███████▉  | 8274/10399 [1:31:53<23:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18673, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2:  80%|███████▉  | 8290/10399 [1:32:03<22:55,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18689, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2:  80%|███████▉  | 8306/10399 [1:32:14<22:45,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18705, grad_norm=0.012, lr=0.0002, loss=0.195] 
Train Epoch #2:  80%|████████  | 8322/10399 [1:32:24<22:34,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18721, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  80%|████████  | 8338/10399 [1:32:35<22:23,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18737, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2:  80%|████████  | 8354/10399 [1:32:45<22:13,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18753, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2:  80%|████████  | 8370/10399 [1:32:56<22:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18769, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2:  81%|████████  | 8386/10399 [1:33:06<22:02,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18785, grad_norm=0.0158, lr=0.0002, loss=0.195]
Train Epoch #2:  81%|████████  | 8402/10399 [1:33:17<21:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18801, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2:  81%|████████  | 8418/10399 [1:33:27<21:35,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18817, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  81%|████████  | 8433/10399 [1:33:37<21:25,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18832, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  81%|████████  | 8434/10399 [1:33:38<21:28,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18833, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2:  81%|████████▏ | 8450/10399 [1:33:48<21:22,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18849, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2:  81%|████████▏ | 8466/10399 [1:33:59<21:08,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18865, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  82%|████████▏ | 8482/10399 [1:34:09<20:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18881, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2:  82%|████████▏ | 8498/10399 [1:34:19<20:44,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18897, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2:  82%|████████▏ | 8514/10399 [1:34:30<20:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18913, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2:  82%|████████▏ | 8530/10399 [1:34:40<20:23,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18929, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2:  82%|████████▏ | 8546/10399 [1:34:51<20:19,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18945, grad_norm=0.0113, lr=0.0002, loss=0.195]
Train Epoch #2:  82%|████████▏ | 8561/10399 [1:35:01<20:09,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18960, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2:  82%|████████▏ | 8562/10399 [1:35:02<20:11,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18961, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  82%|████████▏ | 8578/10399 [1:35:12<19:57,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18977, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  83%|████████▎ | 8594/10399 [1:35:23<19:44,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18993, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2:  83%|████████▎ | 8601/10399 [1:35:37<19:39,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19000, grad_norm=0.0128, lr=0.0002, loss=0.195]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 19000

Train Epoch #2:  83%|████████▎ | 8602/10399 [1:35:42<28:45,  1.04it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19001, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  83%|████████▎ | 8618/10399 [1:35:52<25:26,  1.17it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19017, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2:  83%|████████▎ | 8634/10399 [1:36:02<23:15,  1.27it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19033, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2:  83%|████████▎ | 8650/10399 [1:36:13<21:45,  1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19049, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2:  83%|████████▎ | 8666/10399 [1:36:23<20:46,  1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19065, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2:  83%|████████▎ | 8682/10399 [1:36:34<19:58,  1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19081, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2:  84%|████████▎ | 8698/10399 [1:36:44<19:22,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19097, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  84%|████████▍ | 8714/10399 [1:36:55<18:56,  1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19113, grad_norm=0.0115, lr=0.0002, loss=0.195]
Train Epoch #2:  84%|████████▍ | 8730/10399 [1:37:05<18:34,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19129, grad_norm=0.0121, lr=0.0002, loss=0.195]
Train Epoch #2:  84%|████████▍ | 8746/10399 [1:37:16<18:16,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19145, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2:  84%|████████▍ | 8762/10399 [1:37:26<18:00,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19161, grad_norm=0.0116, lr=0.0002, loss=0.195]
Train Epoch #2:  84%|████████▍ | 8778/10399 [1:37:36<17:45,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19177, grad_norm=0.0129, lr=0.0002, loss=0.195]
Train Epoch #2:  85%|████████▍ | 8794/10399 [1:37:47<17:31,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19193, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2:  85%|████████▍ | 8809/10399 [1:37:57<17:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19208, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2:  85%|████████▍ | 8810/10399 [1:37:57<17:18,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19209, grad_norm=0.0121, lr=0.0002, loss=0.195]
Train Epoch #2:  85%|████████▍ | 8826/10399 [1:38:08<17:07,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19225, grad_norm=0.0142, lr=0.0002, loss=0.195]
Train Epoch #2:  85%|████████▌ | 8842/10399 [1:38:18<16:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19241, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2:  85%|████████▌ | 8858/10399 [1:38:29<16:45,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19257, grad_norm=0.0156, lr=0.0002, loss=0.195]
Train Epoch #2:  85%|████████▌ | 8874/10399 [1:38:39<16:34,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19273, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2:  85%|████████▌ | 8890/10399 [1:38:49<16:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19289, grad_norm=0.0138, lr=0.0002, loss=0.195]
Train Epoch #2:  86%|████████▌ | 8906/10399 [1:39:00<16:13,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19305, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2:  86%|████████▌ | 8922/10399 [1:39:10<16:02,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19321, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2:  86%|████████▌ | 8938/10399 [1:39:21<15:52,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19337, grad_norm=0.014, lr=0.0002, loss=0.195] 
Train Epoch #2:  86%|████████▌ | 8954/10399 [1:39:31<15:42,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19353, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2:  86%|████████▌ | 8969/10399 [1:39:41<15:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19368, grad_norm=0.0121, lr=0.0002, loss=0.195]
Train Epoch #2:  86%|████████▋ | 8970/10399 [1:39:42<15:32,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19369, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  86%|████████▋ | 8986/10399 [1:39:52<15:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19385, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2:  87%|████████▋ | 9002/10399 [1:40:03<15:11,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19401, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2:  87%|████████▋ | 9018/10399 [1:40:13<15:01,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19417, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2:  87%|████████▋ | 9034/10399 [1:40:23<14:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19433, grad_norm=0.0117, lr=0.0002, loss=0.195]
Train Epoch #2:  87%|████████▋ | 9050/10399 [1:40:34<14:43,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19449, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2:  87%|████████▋ | 9066/10399 [1:40:45<14:37,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19465, grad_norm=0.0145, lr=0.0002, loss=0.195]
Train Epoch #2:  87%|████████▋ | 9082/10399 [1:40:55<14:24,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19481, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2:  87%|████████▋ | 9098/10399 [1:41:06<14:21,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19497, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2:  88%|████████▊ | 9113/10399 [1:41:17<14:38,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19512, grad_norm=0.0145, lr=0.0002, loss=0.195]
Train Epoch #2:  88%|████████▊ | 9128/10399 [1:41:27<14:27,  1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19527, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  88%|████████▊ | 9129/10399 [1:41:27<14:14,  1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19528, grad_norm=0.0151, lr=0.0002, loss=0.195]
Train Epoch #2:  88%|████████▊ | 9145/10399 [1:41:38<13:55,  1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19544, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2:  88%|████████▊ | 9161/10399 [1:41:48<13:39,  1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19560, grad_norm=0.013, lr=0.0002, loss=0.195] 
Train Epoch #2:  88%|████████▊ | 9177/10399 [1:41:59<13:25,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19576, grad_norm=0.0142, lr=0.0002, loss=0.195]
Train Epoch #2:  88%|████████▊ | 9193/10399 [1:42:09<13:11,  1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19592, grad_norm=0.015, lr=0.0002, loss=0.195] 
Train Epoch #2:  89%|████████▊ | 9209/10399 [1:42:19<12:58,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19608, grad_norm=0.0117, lr=0.0002, loss=0.195]
Train Epoch #2:  89%|████████▊ | 9225/10399 [1:42:30<12:46,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19624, grad_norm=0.012, lr=0.0002, loss=0.195] 
Train Epoch #2:  89%|████████▉ | 9241/10399 [1:42:40<12:35,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19640, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2:  89%|████████▉ | 9257/10399 [1:42:51<12:24,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19656, grad_norm=0.0139, lr=0.0002, loss=0.195]
Train Epoch #2:  89%|████████▉ | 9273/10399 [1:43:01<12:13,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19672, grad_norm=0.0108, lr=0.0002, loss=0.195]
Train Epoch #2:  89%|████████▉ | 9289/10399 [1:43:11<12:02,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19688, grad_norm=0.012, lr=0.0002, loss=0.195] 
Train Epoch #2:  89%|████████▉ | 9304/10399 [1:43:21<11:53,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19703, grad_norm=0.015, lr=0.0002, loss=0.195]
Train Epoch #2:  89%|████████▉ | 9305/10399 [1:43:22<11:52,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19704, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2:  90%|████████▉ | 9321/10399 [1:43:32<11:42,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19720, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2:  90%|████████▉ | 9337/10399 [1:43:43<11:31,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19736, grad_norm=0.0155, lr=0.0002, loss=0.195]
Train Epoch #2:  90%|████████▉ | 9353/10399 [1:43:53<11:20,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19752, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2:  90%|█████████ | 9369/10399 [1:44:03<11:10,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19768, grad_norm=0.0113, lr=0.0002, loss=0.195]
Train Epoch #2:  90%|█████████ | 9385/10399 [1:44:14<11:00,  1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19784, grad_norm=0.0115, lr=0.0002, loss=0.195]
Train Epoch #2:  90%|█████████ | 9401/10399 [1:44:24<10:50,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19800, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2:  91%|█████████ | 9417/10399 [1:44:35<10:39,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19816, grad_norm=0.0127, lr=0.0002, loss=0.194]
Train Epoch #2:  91%|█████████ | 9433/10399 [1:44:45<10:29,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19832, grad_norm=0.0123, lr=0.0002, loss=0.194]
Train Epoch #2:  91%|█████████ | 9449/10399 [1:44:56<10:21,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19848, grad_norm=0.0122, lr=0.0002, loss=0.194]
Train Epoch #2:  91%|█████████ | 9465/10399 [1:45:06<10:10,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19864, grad_norm=0.0129, lr=0.0002, loss=0.194]
Train Epoch #2:  91%|█████████ | 9481/10399 [1:45:17<09:59,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19880, grad_norm=0.0117, lr=0.0002, loss=0.194]
Train Epoch #2:  91%|█████████▏| 9497/10399 [1:45:27<09:48,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19896, grad_norm=0.0136, lr=0.0002, loss=0.194]
Train Epoch #2:  91%|█████████▏| 9512/10399 [1:45:37<09:38,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19911, grad_norm=0.0127, lr=0.0002, loss=0.194]
Train Epoch #2:  91%|█████████▏| 9513/10399 [1:45:38<09:37,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19912, grad_norm=0.0119, lr=0.0002, loss=0.194]
Train Epoch #2:  92%|█████████▏| 9529/10399 [1:45:48<09:27,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19928, grad_norm=0.0146, lr=0.0002, loss=0.194]
Train Epoch #2:  92%|█████████▏| 9545/10399 [1:45:58<09:16,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19944, grad_norm=0.0144, lr=0.0002, loss=0.194]
Train Epoch #2:  92%|█████████▏| 9561/10399 [1:46:09<09:06,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=2e+4, grad_norm=0.0125, lr=0.0002, loss=0.194] 
Train Epoch #2:  92%|█████████▏| 9577/10399 [1:46:19<08:56,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=2e+4, grad_norm=0.0136, lr=0.0002, loss=0.194]
Train Epoch #2:  92%|█████████▏| 9593/10399 [1:46:30<08:45,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=2e+4, grad_norm=0.0115, lr=0.0002, loss=0.194]

Valid Step #20000:   0%|          | 0/1563 [00:00<?, ?it/s]
Train Epoch #2:  92%|█████████▏| 9601/10399 [1:46:41<08:40,  1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=2e+4, grad_norm=0.0127, lr=0.0002, loss=0.194]

Valid Step #20000:   0%|          | 1/1563 [00:17<7:33:16, 17.41s/it]

Valid Step #20000:   0%|          | 2/1563 [00:33<7:07:06, 16.42s/it]