GiusFra commited on
Commit
e04eea1
·
verified ·
1 Parent(s): 5953766

Upload fp8_att/quant_params.json with huggingface_hub

Browse files
Files changed (1) hide show
  1. fp8_att/quant_params.json +2178 -0
fp8_att/quant_params.json ADDED
@@ -0,0 +1,2178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "time_text_embed.timestep_embedder.linear_1": {},
3
+ "time_text_embed.timestep_embedder.linear_2": {},
4
+ "time_text_embed.guidance_embedder.linear_1": {},
5
+ "time_text_embed.guidance_embedder.linear_2": {},
6
+ "time_text_embed.text_embedder.linear_1": {},
7
+ "time_text_embed.text_embedder.linear_2": {},
8
+ "context_embedder": {},
9
+ "x_embedder": {},
10
+ "transformer_blocks.0.norm1.linear": {},
11
+ "transformer_blocks.0.norm1_context.linear": {},
12
+ "transformer_blocks.0.attn.to_q": {},
13
+ "transformer_blocks.0.attn.to_k": {},
14
+ "transformer_blocks.0.attn.to_v": {},
15
+ "transformer_blocks.0.attn.add_k_proj": {},
16
+ "transformer_blocks.0.attn.add_v_proj": {},
17
+ "transformer_blocks.0.attn.add_q_proj": {},
18
+ "transformer_blocks.0.attn.to_out.0": {},
19
+ "transformer_blocks.0.attn.to_add_out": {},
20
+ "transformer_blocks.0.attn.to_qkv": {},
21
+ "transformer_blocks.0.attn.to_added_qkv": {},
22
+ "transformer_blocks.0.attn.output_softmax_quant": {
23
+ "act_scale": 0.0030924479942768812,
24
+ "act_scale_shape": [],
25
+ "act_zp": 0.0,
26
+ "act_zp_shape": [],
27
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
28
+ },
29
+ "transformer_blocks.0.attn.out_q": {
30
+ "act_scale": 0.1770833283662796,
31
+ "act_scale_shape": [],
32
+ "act_zp": 0.0,
33
+ "act_zp_shape": [],
34
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
35
+ },
36
+ "transformer_blocks.0.attn.out_k": {
37
+ "act_scale": 0.06666667014360428,
38
+ "act_scale_shape": [],
39
+ "act_zp": 0.0,
40
+ "act_zp_shape": [],
41
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
42
+ },
43
+ "transformer_blocks.0.attn.out_v": {
44
+ "act_scale": 0.0520833320915699,
45
+ "act_scale_shape": [],
46
+ "act_zp": 0.0,
47
+ "act_zp_shape": [],
48
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
49
+ },
50
+ "transformer_blocks.0.ff.net.0.proj": {},
51
+ "transformer_blocks.0.ff.net.2": {},
52
+ "transformer_blocks.0.ff_context.net.0.proj": {},
53
+ "transformer_blocks.0.ff_context.net.2": {},
54
+ "transformer_blocks.1.norm1.linear": {},
55
+ "transformer_blocks.1.norm1_context.linear": {},
56
+ "transformer_blocks.1.attn.to_q": {},
57
+ "transformer_blocks.1.attn.to_k": {},
58
+ "transformer_blocks.1.attn.to_v": {},
59
+ "transformer_blocks.1.attn.add_k_proj": {},
60
+ "transformer_blocks.1.attn.add_v_proj": {},
61
+ "transformer_blocks.1.attn.add_q_proj": {},
62
+ "transformer_blocks.1.attn.to_out.0": {},
63
+ "transformer_blocks.1.attn.to_add_out": {},
64
+ "transformer_blocks.1.attn.to_qkv": {},
65
+ "transformer_blocks.1.attn.to_added_qkv": {},
66
+ "transformer_blocks.1.attn.output_softmax_quant": {
67
+ "act_scale": 0.0030273436568677425,
68
+ "act_scale_shape": [],
69
+ "act_zp": 0.0,
70
+ "act_zp_shape": [],
71
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
72
+ },
73
+ "transformer_blocks.1.attn.out_q": {
74
+ "act_scale": 0.13333334028720856,
75
+ "act_scale_shape": [],
76
+ "act_zp": 0.0,
77
+ "act_zp_shape": [],
78
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
79
+ },
80
+ "transformer_blocks.1.attn.out_k": {
81
+ "act_scale": 0.11041666567325592,
82
+ "act_scale_shape": [],
83
+ "act_zp": 0.0,
84
+ "act_zp_shape": [],
85
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
86
+ },
87
+ "transformer_blocks.1.attn.out_v": {
88
+ "act_scale": 0.109375,
89
+ "act_scale_shape": [],
90
+ "act_zp": 0.0,
91
+ "act_zp_shape": [],
92
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
93
+ },
94
+ "transformer_blocks.1.ff.net.0.proj": {},
95
+ "transformer_blocks.1.ff.net.2": {},
96
+ "transformer_blocks.1.ff_context.net.0.proj": {},
97
+ "transformer_blocks.1.ff_context.net.2": {},
98
+ "transformer_blocks.2.norm1.linear": {},
99
+ "transformer_blocks.2.norm1_context.linear": {},
100
+ "transformer_blocks.2.attn.to_q": {},
101
+ "transformer_blocks.2.attn.to_k": {},
102
+ "transformer_blocks.2.attn.to_v": {},
103
+ "transformer_blocks.2.attn.add_k_proj": {},
104
+ "transformer_blocks.2.attn.add_v_proj": {},
105
+ "transformer_blocks.2.attn.add_q_proj": {},
106
+ "transformer_blocks.2.attn.to_out.0": {},
107
+ "transformer_blocks.2.attn.to_add_out": {},
108
+ "transformer_blocks.2.attn.to_qkv": {},
109
+ "transformer_blocks.2.attn.to_added_qkv": {},
110
+ "transformer_blocks.2.attn.output_softmax_quant": {
111
+ "act_scale": 0.0034993488807231188,
112
+ "act_scale_shape": [],
113
+ "act_zp": 0.0,
114
+ "act_zp_shape": [],
115
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
116
+ },
117
+ "transformer_blocks.2.attn.out_q": {
118
+ "act_scale": 0.13333334028720856,
119
+ "act_scale_shape": [],
120
+ "act_zp": 0.0,
121
+ "act_zp_shape": [],
122
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
123
+ },
124
+ "transformer_blocks.2.attn.out_k": {
125
+ "act_scale": 0.08020833134651184,
126
+ "act_scale_shape": [],
127
+ "act_zp": 0.0,
128
+ "act_zp_shape": [],
129
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
130
+ },
131
+ "transformer_blocks.2.attn.out_v": {
132
+ "act_scale": 0.05416666716337204,
133
+ "act_scale_shape": [],
134
+ "act_zp": 0.0,
135
+ "act_zp_shape": [],
136
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
137
+ },
138
+ "transformer_blocks.2.ff.net.0.proj": {},
139
+ "transformer_blocks.2.ff.net.2": {},
140
+ "transformer_blocks.2.ff_context.net.0.proj": {},
141
+ "transformer_blocks.2.ff_context.net.2": {},
142
+ "transformer_blocks.3.norm1.linear": {},
143
+ "transformer_blocks.3.norm1_context.linear": {},
144
+ "transformer_blocks.3.attn.to_q": {},
145
+ "transformer_blocks.3.attn.to_k": {},
146
+ "transformer_blocks.3.attn.to_v": {},
147
+ "transformer_blocks.3.attn.add_k_proj": {},
148
+ "transformer_blocks.3.attn.add_v_proj": {},
149
+ "transformer_blocks.3.attn.add_q_proj": {},
150
+ "transformer_blocks.3.attn.to_out.0": {},
151
+ "transformer_blocks.3.attn.to_add_out": {},
152
+ "transformer_blocks.3.attn.to_qkv": {},
153
+ "transformer_blocks.3.attn.to_added_qkv": {},
154
+ "transformer_blocks.3.attn.output_softmax_quant": {
155
+ "act_scale": 0.003971354104578495,
156
+ "act_scale_shape": [],
157
+ "act_zp": 0.0,
158
+ "act_zp_shape": [],
159
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
160
+ },
161
+ "transformer_blocks.3.attn.out_q": {
162
+ "act_scale": 0.12395833432674408,
163
+ "act_scale_shape": [],
164
+ "act_zp": 0.0,
165
+ "act_zp_shape": [],
166
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
167
+ },
168
+ "transformer_blocks.3.attn.out_k": {
169
+ "act_scale": 0.09375,
170
+ "act_scale_shape": [],
171
+ "act_zp": 0.0,
172
+ "act_zp_shape": [],
173
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
174
+ },
175
+ "transformer_blocks.3.attn.out_v": {
176
+ "act_scale": 0.04322916641831398,
177
+ "act_scale_shape": [],
178
+ "act_zp": 0.0,
179
+ "act_zp_shape": [],
180
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
181
+ },
182
+ "transformer_blocks.3.ff.net.0.proj": {},
183
+ "transformer_blocks.3.ff.net.2": {},
184
+ "transformer_blocks.3.ff_context.net.0.proj": {},
185
+ "transformer_blocks.3.ff_context.net.2": {},
186
+ "transformer_blocks.4.norm1.linear": {},
187
+ "transformer_blocks.4.norm1_context.linear": {},
188
+ "transformer_blocks.4.attn.to_q": {},
189
+ "transformer_blocks.4.attn.to_k": {},
190
+ "transformer_blocks.4.attn.to_v": {},
191
+ "transformer_blocks.4.attn.add_k_proj": {},
192
+ "transformer_blocks.4.attn.add_v_proj": {},
193
+ "transformer_blocks.4.attn.add_q_proj": {},
194
+ "transformer_blocks.4.attn.to_out.0": {},
195
+ "transformer_blocks.4.attn.to_add_out": {},
196
+ "transformer_blocks.4.attn.to_qkv": {},
197
+ "transformer_blocks.4.attn.to_added_qkv": {},
198
+ "transformer_blocks.4.attn.output_softmax_quant": {
199
+ "act_scale": 0.004101562313735485,
200
+ "act_scale_shape": [],
201
+ "act_zp": 0.0,
202
+ "act_zp_shape": [],
203
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
204
+ },
205
+ "transformer_blocks.4.attn.out_q": {
206
+ "act_scale": 0.13333334028720856,
207
+ "act_scale_shape": [],
208
+ "act_zp": 0.0,
209
+ "act_zp_shape": [],
210
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
211
+ },
212
+ "transformer_blocks.4.attn.out_k": {
213
+ "act_scale": 0.10572917014360428,
214
+ "act_scale_shape": [],
215
+ "act_zp": 0.0,
216
+ "act_zp_shape": [],
217
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
218
+ },
219
+ "transformer_blocks.4.attn.out_v": {
220
+ "act_scale": 0.05156249925494194,
221
+ "act_scale_shape": [],
222
+ "act_zp": 0.0,
223
+ "act_zp_shape": [],
224
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
225
+ },
226
+ "transformer_blocks.4.ff.net.0.proj": {},
227
+ "transformer_blocks.4.ff.net.2": {},
228
+ "transformer_blocks.4.ff_context.net.0.proj": {},
229
+ "transformer_blocks.4.ff_context.net.2": {},
230
+ "transformer_blocks.5.norm1.linear": {},
231
+ "transformer_blocks.5.norm1_context.linear": {},
232
+ "transformer_blocks.5.attn.to_q": {},
233
+ "transformer_blocks.5.attn.to_k": {},
234
+ "transformer_blocks.5.attn.to_v": {},
235
+ "transformer_blocks.5.attn.add_k_proj": {},
236
+ "transformer_blocks.5.attn.add_v_proj": {},
237
+ "transformer_blocks.5.attn.add_q_proj": {},
238
+ "transformer_blocks.5.attn.to_out.0": {},
239
+ "transformer_blocks.5.attn.to_add_out": {},
240
+ "transformer_blocks.5.attn.to_qkv": {},
241
+ "transformer_blocks.5.attn.to_added_qkv": {},
242
+ "transformer_blocks.5.attn.output_softmax_quant": {
243
+ "act_scale": 0.0038736979477107525,
244
+ "act_scale_shape": [],
245
+ "act_zp": 0.0,
246
+ "act_zp_shape": [],
247
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
248
+ },
249
+ "transformer_blocks.5.attn.out_q": {
250
+ "act_scale": 0.06666667014360428,
251
+ "act_scale_shape": [],
252
+ "act_zp": 0.0,
253
+ "act_zp_shape": [],
254
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
255
+ },
256
+ "transformer_blocks.5.attn.out_k": {
257
+ "act_scale": 0.13333334028720856,
258
+ "act_scale_shape": [],
259
+ "act_zp": 0.0,
260
+ "act_zp_shape": [],
261
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
262
+ },
263
+ "transformer_blocks.5.attn.out_v": {
264
+ "act_scale": 0.04713541641831398,
265
+ "act_scale_shape": [],
266
+ "act_zp": 0.0,
267
+ "act_zp_shape": [],
268
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
269
+ },
270
+ "transformer_blocks.5.ff.net.0.proj": {},
271
+ "transformer_blocks.5.ff.net.2": {},
272
+ "transformer_blocks.5.ff_context.net.0.proj": {},
273
+ "transformer_blocks.5.ff_context.net.2": {},
274
+ "transformer_blocks.6.norm1.linear": {},
275
+ "transformer_blocks.6.norm1_context.linear": {},
276
+ "transformer_blocks.6.attn.to_q": {},
277
+ "transformer_blocks.6.attn.to_k": {},
278
+ "transformer_blocks.6.attn.to_v": {},
279
+ "transformer_blocks.6.attn.add_k_proj": {},
280
+ "transformer_blocks.6.attn.add_v_proj": {},
281
+ "transformer_blocks.6.attn.add_q_proj": {},
282
+ "transformer_blocks.6.attn.to_out.0": {},
283
+ "transformer_blocks.6.attn.to_add_out": {},
284
+ "transformer_blocks.6.attn.to_qkv": {},
285
+ "transformer_blocks.6.attn.to_added_qkv": {},
286
+ "transformer_blocks.6.attn.output_softmax_quant": {
287
+ "act_scale": 0.0037434895057231188,
288
+ "act_scale_shape": [],
289
+ "act_zp": 0.0,
290
+ "act_zp_shape": [],
291
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
292
+ },
293
+ "transformer_blocks.6.attn.out_q": {
294
+ "act_scale": 0.06666667014360428,
295
+ "act_scale_shape": [],
296
+ "act_zp": 0.0,
297
+ "act_zp_shape": [],
298
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
299
+ },
300
+ "transformer_blocks.6.attn.out_k": {
301
+ "act_scale": 0.13333334028720856,
302
+ "act_scale_shape": [],
303
+ "act_zp": 0.0,
304
+ "act_zp_shape": [],
305
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
306
+ },
307
+ "transformer_blocks.6.attn.out_v": {
308
+ "act_scale": 0.06666667014360428,
309
+ "act_scale_shape": [],
310
+ "act_zp": 0.0,
311
+ "act_zp_shape": [],
312
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
313
+ },
314
+ "transformer_blocks.6.ff.net.0.proj": {},
315
+ "transformer_blocks.6.ff.net.2": {},
316
+ "transformer_blocks.6.ff_context.net.0.proj": {},
317
+ "transformer_blocks.6.ff_context.net.2": {},
318
+ "transformer_blocks.7.norm1.linear": {},
319
+ "transformer_blocks.7.norm1_context.linear": {},
320
+ "transformer_blocks.7.attn.to_q": {},
321
+ "transformer_blocks.7.attn.to_k": {},
322
+ "transformer_blocks.7.attn.to_v": {},
323
+ "transformer_blocks.7.attn.add_k_proj": {},
324
+ "transformer_blocks.7.attn.add_v_proj": {},
325
+ "transformer_blocks.7.attn.add_q_proj": {},
326
+ "transformer_blocks.7.attn.to_out.0": {},
327
+ "transformer_blocks.7.attn.to_add_out": {},
328
+ "transformer_blocks.7.attn.to_qkv": {},
329
+ "transformer_blocks.7.attn.to_added_qkv": {},
330
+ "transformer_blocks.7.attn.output_softmax_quant": {
331
+ "act_scale": 0.00390625,
332
+ "act_scale_shape": [],
333
+ "act_zp": 0.0,
334
+ "act_zp_shape": [],
335
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
336
+ },
337
+ "transformer_blocks.7.attn.out_q": {
338
+ "act_scale": 0.10104166716337204,
339
+ "act_scale_shape": [],
340
+ "act_zp": 0.0,
341
+ "act_zp_shape": [],
342
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
343
+ },
344
+ "transformer_blocks.7.attn.out_k": {
345
+ "act_scale": 0.15729166567325592,
346
+ "act_scale_shape": [],
347
+ "act_zp": 0.0,
348
+ "act_zp_shape": [],
349
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
350
+ },
351
+ "transformer_blocks.7.attn.out_v": {
352
+ "act_scale": 0.06640625,
353
+ "act_scale_shape": [],
354
+ "act_zp": 0.0,
355
+ "act_zp_shape": [],
356
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
357
+ },
358
+ "transformer_blocks.7.ff.net.0.proj": {},
359
+ "transformer_blocks.7.ff.net.2": {},
360
+ "transformer_blocks.7.ff_context.net.0.proj": {},
361
+ "transformer_blocks.7.ff_context.net.2": {},
362
+ "transformer_blocks.8.norm1.linear": {},
363
+ "transformer_blocks.8.norm1_context.linear": {},
364
+ "transformer_blocks.8.attn.to_q": {},
365
+ "transformer_blocks.8.attn.to_k": {},
366
+ "transformer_blocks.8.attn.to_v": {},
367
+ "transformer_blocks.8.attn.add_k_proj": {},
368
+ "transformer_blocks.8.attn.add_v_proj": {},
369
+ "transformer_blocks.8.attn.add_q_proj": {},
370
+ "transformer_blocks.8.attn.to_out.0": {},
371
+ "transformer_blocks.8.attn.to_add_out": {},
372
+ "transformer_blocks.8.attn.to_qkv": {},
373
+ "transformer_blocks.8.attn.to_added_qkv": {},
374
+ "transformer_blocks.8.attn.output_softmax_quant": {
375
+ "act_scale": 0.003841145895421505,
376
+ "act_scale_shape": [],
377
+ "act_zp": 0.0,
378
+ "act_zp_shape": [],
379
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
380
+ },
381
+ "transformer_blocks.8.attn.out_q": {
382
+ "act_scale": 0.08385416865348816,
383
+ "act_scale_shape": [],
384
+ "act_zp": 0.0,
385
+ "act_zp_shape": [],
386
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
387
+ },
388
+ "transformer_blocks.8.attn.out_k": {
389
+ "act_scale": 0.13333334028720856,
390
+ "act_scale_shape": [],
391
+ "act_zp": 0.0,
392
+ "act_zp_shape": [],
393
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
394
+ },
395
+ "transformer_blocks.8.attn.out_v": {
396
+ "act_scale": 0.05494791641831398,
397
+ "act_scale_shape": [],
398
+ "act_zp": 0.0,
399
+ "act_zp_shape": [],
400
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
401
+ },
402
+ "transformer_blocks.8.ff.net.0.proj": {},
403
+ "transformer_blocks.8.ff.net.2": {},
404
+ "transformer_blocks.8.ff_context.net.0.proj": {},
405
+ "transformer_blocks.8.ff_context.net.2": {},
406
+ "transformer_blocks.9.norm1.linear": {},
407
+ "transformer_blocks.9.norm1_context.linear": {},
408
+ "transformer_blocks.9.attn.to_q": {},
409
+ "transformer_blocks.9.attn.to_k": {},
410
+ "transformer_blocks.9.attn.to_v": {},
411
+ "transformer_blocks.9.attn.add_k_proj": {},
412
+ "transformer_blocks.9.attn.add_v_proj": {},
413
+ "transformer_blocks.9.attn.add_q_proj": {},
414
+ "transformer_blocks.9.attn.to_out.0": {},
415
+ "transformer_blocks.9.attn.to_add_out": {},
416
+ "transformer_blocks.9.attn.to_qkv": {},
417
+ "transformer_blocks.9.attn.to_added_qkv": {},
418
+ "transformer_blocks.9.attn.output_softmax_quant": {
419
+ "act_scale": 0.0035807292442768812,
420
+ "act_scale_shape": [],
421
+ "act_zp": 0.0,
422
+ "act_zp_shape": [],
423
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
424
+ },
425
+ "transformer_blocks.9.attn.out_q": {
426
+ "act_scale": 0.06666667014360428,
427
+ "act_scale_shape": [],
428
+ "act_zp": 0.0,
429
+ "act_zp_shape": [],
430
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
431
+ },
432
+ "transformer_blocks.9.attn.out_k": {
433
+ "act_scale": 0.06666667014360428,
434
+ "act_scale_shape": [],
435
+ "act_zp": 0.0,
436
+ "act_zp_shape": [],
437
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
438
+ },
439
+ "transformer_blocks.9.attn.out_v": {
440
+ "act_scale": 0.05651041492819786,
441
+ "act_scale_shape": [],
442
+ "act_zp": 0.0,
443
+ "act_zp_shape": [],
444
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
445
+ },
446
+ "transformer_blocks.9.ff.net.0.proj": {},
447
+ "transformer_blocks.9.ff.net.2": {},
448
+ "transformer_blocks.9.ff_context.net.0.proj": {},
449
+ "transformer_blocks.9.ff_context.net.2": {},
450
+ "transformer_blocks.10.norm1.linear": {},
451
+ "transformer_blocks.10.norm1_context.linear": {},
452
+ "transformer_blocks.10.attn.to_q": {},
453
+ "transformer_blocks.10.attn.to_k": {},
454
+ "transformer_blocks.10.attn.to_v": {},
455
+ "transformer_blocks.10.attn.add_k_proj": {},
456
+ "transformer_blocks.10.attn.add_v_proj": {},
457
+ "transformer_blocks.10.attn.add_q_proj": {},
458
+ "transformer_blocks.10.attn.to_out.0": {},
459
+ "transformer_blocks.10.attn.to_add_out": {},
460
+ "transformer_blocks.10.attn.to_qkv": {},
461
+ "transformer_blocks.10.attn.to_added_qkv": {},
462
+ "transformer_blocks.10.attn.output_softmax_quant": {
463
+ "act_scale": 0.0028483073692768812,
464
+ "act_scale_shape": [],
465
+ "act_zp": 0.0,
466
+ "act_zp_shape": [],
467
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
468
+ },
469
+ "transformer_blocks.10.attn.out_q": {
470
+ "act_scale": 0.11145833134651184,
471
+ "act_scale_shape": [],
472
+ "act_zp": 0.0,
473
+ "act_zp_shape": [],
474
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
475
+ },
476
+ "transformer_blocks.10.attn.out_k": {
477
+ "act_scale": 0.13333334028720856,
478
+ "act_scale_shape": [],
479
+ "act_zp": 0.0,
480
+ "act_zp_shape": [],
481
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
482
+ },
483
+ "transformer_blocks.10.attn.out_v": {
484
+ "act_scale": 0.06328125298023224,
485
+ "act_scale_shape": [],
486
+ "act_zp": 0.0,
487
+ "act_zp_shape": [],
488
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
489
+ },
490
+ "transformer_blocks.10.ff.net.0.proj": {},
491
+ "transformer_blocks.10.ff.net.2": {},
492
+ "transformer_blocks.10.ff_context.net.0.proj": {},
493
+ "transformer_blocks.10.ff_context.net.2": {},
494
+ "transformer_blocks.11.norm1.linear": {},
495
+ "transformer_blocks.11.norm1_context.linear": {},
496
+ "transformer_blocks.11.attn.to_q": {},
497
+ "transformer_blocks.11.attn.to_k": {},
498
+ "transformer_blocks.11.attn.to_v": {},
499
+ "transformer_blocks.11.attn.add_k_proj": {},
500
+ "transformer_blocks.11.attn.add_v_proj": {},
501
+ "transformer_blocks.11.attn.add_q_proj": {},
502
+ "transformer_blocks.11.attn.to_out.0": {},
503
+ "transformer_blocks.11.attn.to_add_out": {},
504
+ "transformer_blocks.11.attn.to_qkv": {},
505
+ "transformer_blocks.11.attn.to_added_qkv": {},
506
+ "transformer_blocks.11.attn.output_softmax_quant": {
507
+ "act_scale": 0.003597005270421505,
508
+ "act_scale_shape": [],
509
+ "act_zp": 0.0,
510
+ "act_zp_shape": [],
511
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
512
+ },
513
+ "transformer_blocks.11.attn.out_q": {
514
+ "act_scale": 0.09947916865348816,
515
+ "act_scale_shape": [],
516
+ "act_zp": 0.0,
517
+ "act_zp_shape": [],
518
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
519
+ },
520
+ "transformer_blocks.11.attn.out_k": {
521
+ "act_scale": 0.16458334028720856,
522
+ "act_scale_shape": [],
523
+ "act_zp": 0.0,
524
+ "act_zp_shape": [],
525
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
526
+ },
527
+ "transformer_blocks.11.attn.out_v": {
528
+ "act_scale": 0.05885416641831398,
529
+ "act_scale_shape": [],
530
+ "act_zp": 0.0,
531
+ "act_zp_shape": [],
532
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
533
+ },
534
+ "transformer_blocks.11.ff.net.0.proj": {},
535
+ "transformer_blocks.11.ff.net.2": {},
536
+ "transformer_blocks.11.ff_context.net.0.proj": {},
537
+ "transformer_blocks.11.ff_context.net.2": {},
538
+ "transformer_blocks.12.norm1.linear": {},
539
+ "transformer_blocks.12.norm1_context.linear": {},
540
+ "transformer_blocks.12.attn.to_q": {},
541
+ "transformer_blocks.12.attn.to_k": {},
542
+ "transformer_blocks.12.attn.to_v": {},
543
+ "transformer_blocks.12.attn.add_k_proj": {},
544
+ "transformer_blocks.12.attn.add_v_proj": {},
545
+ "transformer_blocks.12.attn.add_q_proj": {},
546
+ "transformer_blocks.12.attn.to_out.0": {},
547
+ "transformer_blocks.12.attn.to_add_out": {},
548
+ "transformer_blocks.12.attn.to_qkv": {},
549
+ "transformer_blocks.12.attn.to_added_qkv": {},
550
+ "transformer_blocks.12.attn.output_softmax_quant": {
551
+ "act_scale": 0.0038085938431322575,
552
+ "act_scale_shape": [],
553
+ "act_zp": 0.0,
554
+ "act_zp_shape": [],
555
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
556
+ },
557
+ "transformer_blocks.12.attn.out_q": {
558
+ "act_scale": 0.08697916567325592,
559
+ "act_scale_shape": [],
560
+ "act_zp": 0.0,
561
+ "act_zp_shape": [],
562
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
563
+ },
564
+ "transformer_blocks.12.attn.out_k": {
565
+ "act_scale": 0.08749999850988388,
566
+ "act_scale_shape": [],
567
+ "act_zp": 0.0,
568
+ "act_zp_shape": [],
569
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
570
+ },
571
+ "transformer_blocks.12.attn.out_v": {
572
+ "act_scale": 0.0598958320915699,
573
+ "act_scale_shape": [],
574
+ "act_zp": 0.0,
575
+ "act_zp_shape": [],
576
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
577
+ },
578
+ "transformer_blocks.12.ff.net.0.proj": {},
579
+ "transformer_blocks.12.ff.net.2": {},
580
+ "transformer_blocks.12.ff_context.net.0.proj": {},
581
+ "transformer_blocks.12.ff_context.net.2": {},
582
+ "transformer_blocks.13.norm1.linear": {},
583
+ "transformer_blocks.13.norm1_context.linear": {},
584
+ "transformer_blocks.13.attn.to_q": {},
585
+ "transformer_blocks.13.attn.to_k": {},
586
+ "transformer_blocks.13.attn.to_v": {},
587
+ "transformer_blocks.13.attn.add_k_proj": {},
588
+ "transformer_blocks.13.attn.add_v_proj": {},
589
+ "transformer_blocks.13.attn.add_q_proj": {},
590
+ "transformer_blocks.13.attn.to_out.0": {},
591
+ "transformer_blocks.13.attn.to_add_out": {},
592
+ "transformer_blocks.13.attn.to_qkv": {},
593
+ "transformer_blocks.13.attn.to_added_qkv": {},
594
+ "transformer_blocks.13.attn.output_softmax_quant": {
595
+ "act_scale": 0.0037760415580123663,
596
+ "act_scale_shape": [],
597
+ "act_zp": 0.0,
598
+ "act_zp_shape": [],
599
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
600
+ },
601
+ "transformer_blocks.13.attn.out_q": {
602
+ "act_scale": 0.0833333358168602,
603
+ "act_scale_shape": [],
604
+ "act_zp": 0.0,
605
+ "act_zp_shape": [],
606
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
607
+ },
608
+ "transformer_blocks.13.attn.out_k": {
609
+ "act_scale": 0.11770833283662796,
610
+ "act_scale_shape": [],
611
+ "act_zp": 0.0,
612
+ "act_zp_shape": [],
613
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
614
+ },
615
+ "transformer_blocks.13.attn.out_v": {
616
+ "act_scale": 0.06666667014360428,
617
+ "act_scale_shape": [],
618
+ "act_zp": 0.0,
619
+ "act_zp_shape": [],
620
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
621
+ },
622
+ "transformer_blocks.13.ff.net.0.proj": {},
623
+ "transformer_blocks.13.ff.net.2": {},
624
+ "transformer_blocks.13.ff_context.net.0.proj": {},
625
+ "transformer_blocks.13.ff_context.net.2": {},
626
+ "transformer_blocks.14.norm1.linear": {},
627
+ "transformer_blocks.14.norm1_context.linear": {},
628
+ "transformer_blocks.14.attn.to_q": {},
629
+ "transformer_blocks.14.attn.to_k": {},
630
+ "transformer_blocks.14.attn.to_v": {},
631
+ "transformer_blocks.14.attn.add_k_proj": {},
632
+ "transformer_blocks.14.attn.add_v_proj": {},
633
+ "transformer_blocks.14.attn.add_q_proj": {},
634
+ "transformer_blocks.14.attn.to_out.0": {},
635
+ "transformer_blocks.14.attn.to_add_out": {},
636
+ "transformer_blocks.14.attn.to_qkv": {},
637
+ "transformer_blocks.14.attn.to_added_qkv": {},
638
+ "transformer_blocks.14.attn.output_softmax_quant": {
639
+ "act_scale": 0.003955078311264515,
640
+ "act_scale_shape": [],
641
+ "act_zp": 0.0,
642
+ "act_zp_shape": [],
643
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
644
+ },
645
+ "transformer_blocks.14.attn.out_q": {
646
+ "act_scale": 0.13333334028720856,
647
+ "act_scale_shape": [],
648
+ "act_zp": 0.0,
649
+ "act_zp_shape": [],
650
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
651
+ },
652
+ "transformer_blocks.14.attn.out_k": {
653
+ "act_scale": 0.2666666805744171,
654
+ "act_scale_shape": [],
655
+ "act_zp": 0.0,
656
+ "act_zp_shape": [],
657
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
658
+ },
659
+ "transformer_blocks.14.attn.out_v": {
660
+ "act_scale": 0.1067708358168602,
661
+ "act_scale_shape": [],
662
+ "act_zp": 0.0,
663
+ "act_zp_shape": [],
664
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
665
+ },
666
+ "transformer_blocks.14.ff.net.0.proj": {},
667
+ "transformer_blocks.14.ff.net.2": {},
668
+ "transformer_blocks.14.ff_context.net.0.proj": {},
669
+ "transformer_blocks.14.ff_context.net.2": {},
670
+ "transformer_blocks.15.norm1.linear": {},
671
+ "transformer_blocks.15.norm1_context.linear": {},
672
+ "transformer_blocks.15.attn.to_q": {},
673
+ "transformer_blocks.15.attn.to_k": {},
674
+ "transformer_blocks.15.attn.to_v": {},
675
+ "transformer_blocks.15.attn.add_k_proj": {},
676
+ "transformer_blocks.15.attn.add_v_proj": {},
677
+ "transformer_blocks.15.attn.add_q_proj": {},
678
+ "transformer_blocks.15.attn.to_out.0": {},
679
+ "transformer_blocks.15.attn.to_add_out": {},
680
+ "transformer_blocks.15.attn.to_qkv": {},
681
+ "transformer_blocks.15.attn.to_added_qkv": {},
682
+ "transformer_blocks.15.attn.output_softmax_quant": {
683
+ "act_scale": 0.003971354104578495,
684
+ "act_scale_shape": [],
685
+ "act_zp": 0.0,
686
+ "act_zp_shape": [],
687
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
688
+ },
689
+ "transformer_blocks.15.attn.out_q": {
690
+ "act_scale": 0.15833333134651184,
691
+ "act_scale_shape": [],
692
+ "act_zp": 0.0,
693
+ "act_zp_shape": [],
694
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
695
+ },
696
+ "transformer_blocks.15.attn.out_k": {
697
+ "act_scale": 0.20624999701976776,
698
+ "act_scale_shape": [],
699
+ "act_zp": 0.0,
700
+ "act_zp_shape": [],
701
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
702
+ },
703
+ "transformer_blocks.15.attn.out_v": {
704
+ "act_scale": 0.07239583134651184,
705
+ "act_scale_shape": [],
706
+ "act_zp": 0.0,
707
+ "act_zp_shape": [],
708
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
709
+ },
710
+ "transformer_blocks.15.ff.net.0.proj": {},
711
+ "transformer_blocks.15.ff.net.2": {},
712
+ "transformer_blocks.15.ff_context.net.0.proj": {},
713
+ "transformer_blocks.15.ff_context.net.2": {},
714
+ "transformer_blocks.16.norm1.linear": {},
715
+ "transformer_blocks.16.norm1_context.linear": {},
716
+ "transformer_blocks.16.attn.to_q": {},
717
+ "transformer_blocks.16.attn.to_k": {},
718
+ "transformer_blocks.16.attn.to_v": {},
719
+ "transformer_blocks.16.attn.add_k_proj": {},
720
+ "transformer_blocks.16.attn.add_v_proj": {},
721
+ "transformer_blocks.16.attn.add_q_proj": {},
722
+ "transformer_blocks.16.attn.to_out.0": {},
723
+ "transformer_blocks.16.attn.to_add_out": {},
724
+ "transformer_blocks.16.attn.to_qkv": {},
725
+ "transformer_blocks.16.attn.to_added_qkv": {},
726
+ "transformer_blocks.16.attn.output_softmax_quant": {
727
+ "act_scale": 0.00390625,
728
+ "act_scale_shape": [],
729
+ "act_zp": 0.0,
730
+ "act_zp_shape": [],
731
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
732
+ },
733
+ "transformer_blocks.16.attn.out_q": {
734
+ "act_scale": 0.16875000298023224,
735
+ "act_scale_shape": [],
736
+ "act_zp": 0.0,
737
+ "act_zp_shape": [],
738
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
739
+ },
740
+ "transformer_blocks.16.attn.out_k": {
741
+ "act_scale": 0.18333333730697632,
742
+ "act_scale_shape": [],
743
+ "act_zp": 0.0,
744
+ "act_zp_shape": [],
745
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
746
+ },
747
+ "transformer_blocks.16.attn.out_v": {
748
+ "act_scale": 0.08906249701976776,
749
+ "act_scale_shape": [],
750
+ "act_zp": 0.0,
751
+ "act_zp_shape": [],
752
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
753
+ },
754
+ "transformer_blocks.16.ff.net.0.proj": {},
755
+ "transformer_blocks.16.ff.net.2": {},
756
+ "transformer_blocks.16.ff_context.net.0.proj": {},
757
+ "transformer_blocks.16.ff_context.net.2": {},
758
+ "transformer_blocks.17.norm1.linear": {},
759
+ "transformer_blocks.17.norm1_context.linear": {},
760
+ "transformer_blocks.17.attn.to_q": {},
761
+ "transformer_blocks.17.attn.to_k": {},
762
+ "transformer_blocks.17.attn.to_v": {},
763
+ "transformer_blocks.17.attn.add_k_proj": {},
764
+ "transformer_blocks.17.attn.add_v_proj": {},
765
+ "transformer_blocks.17.attn.add_q_proj": {},
766
+ "transformer_blocks.17.attn.to_out.0": {},
767
+ "transformer_blocks.17.attn.to_add_out": {},
768
+ "transformer_blocks.17.attn.to_qkv": {},
769
+ "transformer_blocks.17.attn.to_added_qkv": {},
770
+ "transformer_blocks.17.attn.output_softmax_quant": {
771
+ "act_scale": 0.0039388020522892475,
772
+ "act_scale_shape": [],
773
+ "act_zp": 0.0,
774
+ "act_zp_shape": [],
775
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
776
+ },
777
+ "transformer_blocks.17.attn.out_q": {
778
+ "act_scale": 0.20624999701976776,
779
+ "act_scale_shape": [],
780
+ "act_zp": 0.0,
781
+ "act_zp_shape": [],
782
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
783
+ },
784
+ "transformer_blocks.17.attn.out_k": {
785
+ "act_scale": 0.20729166269302368,
786
+ "act_scale_shape": [],
787
+ "act_zp": 0.0,
788
+ "act_zp_shape": [],
789
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
790
+ },
791
+ "transformer_blocks.17.attn.out_v": {
792
+ "act_scale": 0.08281250298023224,
793
+ "act_scale_shape": [],
794
+ "act_zp": 0.0,
795
+ "act_zp_shape": [],
796
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
797
+ },
798
+ "transformer_blocks.17.ff.net.0.proj": {},
799
+ "transformer_blocks.17.ff.net.2": {},
800
+ "transformer_blocks.17.ff_context.net.0.proj": {},
801
+ "transformer_blocks.17.ff_context.net.2": {},
802
+ "transformer_blocks.18.norm1.linear": {},
803
+ "transformer_blocks.18.norm1_context.linear": {},
804
+ "transformer_blocks.18.attn.to_q": {},
805
+ "transformer_blocks.18.attn.to_k": {},
806
+ "transformer_blocks.18.attn.to_v": {},
807
+ "transformer_blocks.18.attn.add_k_proj": {},
808
+ "transformer_blocks.18.attn.add_v_proj": {},
809
+ "transformer_blocks.18.attn.add_q_proj": {},
810
+ "transformer_blocks.18.attn.to_out.0": {},
811
+ "transformer_blocks.18.attn.to_add_out": {},
812
+ "transformer_blocks.18.attn.to_qkv": {},
813
+ "transformer_blocks.18.attn.to_added_qkv": {},
814
+ "transformer_blocks.18.attn.output_softmax_quant": {
815
+ "act_scale": 0.004166666883975267,
816
+ "act_scale_shape": [],
817
+ "act_zp": 0.0,
818
+ "act_zp_shape": [],
819
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
820
+ },
821
+ "transformer_blocks.18.attn.out_q": {
822
+ "act_scale": 0.15833333134651184,
823
+ "act_scale_shape": [],
824
+ "act_zp": 0.0,
825
+ "act_zp_shape": [],
826
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
827
+ },
828
+ "transformer_blocks.18.attn.out_k": {
829
+ "act_scale": 0.18645833432674408,
830
+ "act_scale_shape": [],
831
+ "act_zp": 0.0,
832
+ "act_zp_shape": [],
833
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
834
+ },
835
+ "transformer_blocks.18.attn.out_v": {
836
+ "act_scale": 0.09583333134651184,
837
+ "act_scale_shape": [],
838
+ "act_zp": 0.0,
839
+ "act_zp_shape": [],
840
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
841
+ },
842
+ "transformer_blocks.18.ff.net.0.proj": {},
843
+ "transformer_blocks.18.ff.net.2": {},
844
+ "transformer_blocks.18.ff_context.net.0.proj": {},
845
+ "transformer_blocks.18.ff_context.net.2": {},
846
+ "single_transformer_blocks.0.norm.linear": {},
847
+ "single_transformer_blocks.0.proj_mlp": {},
848
+ "single_transformer_blocks.0.proj_out": {},
849
+ "single_transformer_blocks.0.attn.to_q": {},
850
+ "single_transformer_blocks.0.attn.to_k": {},
851
+ "single_transformer_blocks.0.attn.to_v": {},
852
+ "single_transformer_blocks.0.attn.to_qkv": {},
853
+ "single_transformer_blocks.0.attn.output_softmax_quant": {
854
+ "act_scale": 0.003483072854578495,
855
+ "act_scale_shape": [],
856
+ "act_zp": 0.0,
857
+ "act_zp_shape": [],
858
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
859
+ },
860
+ "single_transformer_blocks.0.attn.out_q": {
861
+ "act_scale": 0.12343750149011612,
862
+ "act_scale_shape": [],
863
+ "act_zp": 0.0,
864
+ "act_zp_shape": [],
865
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
866
+ },
867
+ "single_transformer_blocks.0.attn.out_k": {
868
+ "act_scale": 0.10104166716337204,
869
+ "act_scale_shape": [],
870
+ "act_zp": 0.0,
871
+ "act_zp_shape": [],
872
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
873
+ },
874
+ "single_transformer_blocks.0.attn.out_v": {
875
+ "act_scale": 0.05885416641831398,
876
+ "act_scale_shape": [],
877
+ "act_zp": 0.0,
878
+ "act_zp_shape": [],
879
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
880
+ },
881
+ "single_transformer_blocks.1.norm.linear": {},
882
+ "single_transformer_blocks.1.proj_mlp": {},
883
+ "single_transformer_blocks.1.proj_out": {},
884
+ "single_transformer_blocks.1.attn.to_q": {},
885
+ "single_transformer_blocks.1.attn.to_k": {},
886
+ "single_transformer_blocks.1.attn.to_v": {},
887
+ "single_transformer_blocks.1.attn.to_qkv": {},
888
+ "single_transformer_blocks.1.attn.output_softmax_quant": {
889
+ "act_scale": 0.0034505208022892475,
890
+ "act_scale_shape": [],
891
+ "act_zp": 0.0,
892
+ "act_zp_shape": [],
893
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
894
+ },
895
+ "single_transformer_blocks.1.attn.out_q": {
896
+ "act_scale": 0.06666667014360428,
897
+ "act_scale_shape": [],
898
+ "act_zp": 0.0,
899
+ "act_zp_shape": [],
900
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
901
+ },
902
+ "single_transformer_blocks.1.attn.out_k": {
903
+ "act_scale": 0.06666667014360428,
904
+ "act_scale_shape": [],
905
+ "act_zp": 0.0,
906
+ "act_zp_shape": [],
907
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
908
+ },
909
+ "single_transformer_blocks.1.attn.out_v": {
910
+ "act_scale": 0.05781250074505806,
911
+ "act_scale_shape": [],
912
+ "act_zp": 0.0,
913
+ "act_zp_shape": [],
914
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
915
+ },
916
+ "single_transformer_blocks.2.norm.linear": {},
917
+ "single_transformer_blocks.2.proj_mlp": {},
918
+ "single_transformer_blocks.2.proj_out": {},
919
+ "single_transformer_blocks.2.attn.to_q": {},
920
+ "single_transformer_blocks.2.attn.to_k": {},
921
+ "single_transformer_blocks.2.attn.to_v": {},
922
+ "single_transformer_blocks.2.attn.to_qkv": {},
923
+ "single_transformer_blocks.2.attn.output_softmax_quant": {
924
+ "act_scale": 0.00390625,
925
+ "act_scale_shape": [],
926
+ "act_zp": 0.0,
927
+ "act_zp_shape": [],
928
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
929
+ },
930
+ "single_transformer_blocks.2.attn.out_q": {
931
+ "act_scale": 0.06484375149011612,
932
+ "act_scale_shape": [],
933
+ "act_zp": 0.0,
934
+ "act_zp_shape": [],
935
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
936
+ },
937
+ "single_transformer_blocks.2.attn.out_k": {
938
+ "act_scale": 0.06666667014360428,
939
+ "act_scale_shape": [],
940
+ "act_zp": 0.0,
941
+ "act_zp_shape": [],
942
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
943
+ },
944
+ "single_transformer_blocks.2.attn.out_v": {
945
+ "act_scale": 0.0481770820915699,
946
+ "act_scale_shape": [],
947
+ "act_zp": 0.0,
948
+ "act_zp_shape": [],
949
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
950
+ },
951
+ "single_transformer_blocks.3.norm.linear": {},
952
+ "single_transformer_blocks.3.proj_mlp": {},
953
+ "single_transformer_blocks.3.proj_out": {},
954
+ "single_transformer_blocks.3.attn.to_q": {},
955
+ "single_transformer_blocks.3.attn.to_k": {},
956
+ "single_transformer_blocks.3.attn.to_v": {},
957
+ "single_transformer_blocks.3.attn.to_qkv": {},
958
+ "single_transformer_blocks.3.attn.output_softmax_quant": {
959
+ "act_scale": 0.0036783854011446238,
960
+ "act_scale_shape": [],
961
+ "act_zp": 0.0,
962
+ "act_zp_shape": [],
963
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
964
+ },
965
+ "single_transformer_blocks.3.attn.out_q": {
966
+ "act_scale": 0.06666667014360428,
967
+ "act_scale_shape": [],
968
+ "act_zp": 0.0,
969
+ "act_zp_shape": [],
970
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
971
+ },
972
+ "single_transformer_blocks.3.attn.out_k": {
973
+ "act_scale": 0.06197916716337204,
974
+ "act_scale_shape": [],
975
+ "act_zp": 0.0,
976
+ "act_zp_shape": [],
977
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
978
+ },
979
+ "single_transformer_blocks.3.attn.out_v": {
980
+ "act_scale": 0.04322916641831398,
981
+ "act_scale_shape": [],
982
+ "act_zp": 0.0,
983
+ "act_zp_shape": [],
984
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
985
+ },
986
+ "single_transformer_blocks.4.norm.linear": {},
987
+ "single_transformer_blocks.4.proj_mlp": {},
988
+ "single_transformer_blocks.4.proj_out": {},
989
+ "single_transformer_blocks.4.attn.to_q": {},
990
+ "single_transformer_blocks.4.attn.to_k": {},
991
+ "single_transformer_blocks.4.attn.to_v": {},
992
+ "single_transformer_blocks.4.attn.to_qkv": {},
993
+ "single_transformer_blocks.4.attn.output_softmax_quant": {
994
+ "act_scale": 0.0037923178169876337,
995
+ "act_scale_shape": [],
996
+ "act_zp": 0.0,
997
+ "act_zp_shape": [],
998
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
999
+ },
1000
+ "single_transformer_blocks.4.attn.out_q": {
1001
+ "act_scale": 0.06588541716337204,
1002
+ "act_scale_shape": [],
1003
+ "act_zp": 0.0,
1004
+ "act_zp_shape": [],
1005
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1006
+ },
1007
+ "single_transformer_blocks.4.attn.out_k": {
1008
+ "act_scale": 0.06666667014360428,
1009
+ "act_scale_shape": [],
1010
+ "act_zp": 0.0,
1011
+ "act_zp_shape": [],
1012
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1013
+ },
1014
+ "single_transformer_blocks.4.attn.out_v": {
1015
+ "act_scale": 0.04270833358168602,
1016
+ "act_scale_shape": [],
1017
+ "act_zp": 0.0,
1018
+ "act_zp_shape": [],
1019
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1020
+ },
1021
+ "single_transformer_blocks.5.norm.linear": {},
1022
+ "single_transformer_blocks.5.proj_mlp": {},
1023
+ "single_transformer_blocks.5.proj_out": {},
1024
+ "single_transformer_blocks.5.attn.to_q": {},
1025
+ "single_transformer_blocks.5.attn.to_k": {},
1026
+ "single_transformer_blocks.5.attn.to_v": {},
1027
+ "single_transformer_blocks.5.attn.to_qkv": {},
1028
+ "single_transformer_blocks.5.attn.output_softmax_quant": {
1029
+ "act_scale": 0.0036295573227107525,
1030
+ "act_scale_shape": [],
1031
+ "act_zp": 0.0,
1032
+ "act_zp_shape": [],
1033
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1034
+ },
1035
+ "single_transformer_blocks.5.attn.out_q": {
1036
+ "act_scale": 0.06666667014360428,
1037
+ "act_scale_shape": [],
1038
+ "act_zp": 0.0,
1039
+ "act_zp_shape": [],
1040
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1041
+ },
1042
+ "single_transformer_blocks.5.attn.out_k": {
1043
+ "act_scale": 0.06640625,
1044
+ "act_scale_shape": [],
1045
+ "act_zp": 0.0,
1046
+ "act_zp_shape": [],
1047
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1048
+ },
1049
+ "single_transformer_blocks.5.attn.out_v": {
1050
+ "act_scale": 0.04192708432674408,
1051
+ "act_scale_shape": [],
1052
+ "act_zp": 0.0,
1053
+ "act_zp_shape": [],
1054
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1055
+ },
1056
+ "single_transformer_blocks.6.norm.linear": {},
1057
+ "single_transformer_blocks.6.proj_mlp": {},
1058
+ "single_transformer_blocks.6.proj_out": {},
1059
+ "single_transformer_blocks.6.attn.to_q": {},
1060
+ "single_transformer_blocks.6.attn.to_k": {},
1061
+ "single_transformer_blocks.6.attn.to_v": {},
1062
+ "single_transformer_blocks.6.attn.to_qkv": {},
1063
+ "single_transformer_blocks.6.attn.output_softmax_quant": {
1064
+ "act_scale": 0.003841145895421505,
1065
+ "act_scale_shape": [],
1066
+ "act_zp": 0.0,
1067
+ "act_zp_shape": [],
1068
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1069
+ },
1070
+ "single_transformer_blocks.6.attn.out_q": {
1071
+ "act_scale": 0.08385416865348816,
1072
+ "act_scale_shape": [],
1073
+ "act_zp": 0.0,
1074
+ "act_zp_shape": [],
1075
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1076
+ },
1077
+ "single_transformer_blocks.6.attn.out_k": {
1078
+ "act_scale": 0.0638020858168602,
1079
+ "act_scale_shape": [],
1080
+ "act_zp": 0.0,
1081
+ "act_zp_shape": [],
1082
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1083
+ },
1084
+ "single_transformer_blocks.6.attn.out_v": {
1085
+ "act_scale": 0.04453124850988388,
1086
+ "act_scale_shape": [],
1087
+ "act_zp": 0.0,
1088
+ "act_zp_shape": [],
1089
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1090
+ },
1091
+ "single_transformer_blocks.7.norm.linear": {},
1092
+ "single_transformer_blocks.7.proj_mlp": {},
1093
+ "single_transformer_blocks.7.proj_out": {},
1094
+ "single_transformer_blocks.7.attn.to_q": {},
1095
+ "single_transformer_blocks.7.attn.to_k": {},
1096
+ "single_transformer_blocks.7.attn.to_v": {},
1097
+ "single_transformer_blocks.7.attn.to_qkv": {},
1098
+ "single_transformer_blocks.7.attn.output_softmax_quant": {
1099
+ "act_scale": 0.0035481771919876337,
1100
+ "act_scale_shape": [],
1101
+ "act_zp": 0.0,
1102
+ "act_zp_shape": [],
1103
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1104
+ },
1105
+ "single_transformer_blocks.7.attn.out_q": {
1106
+ "act_scale": 0.06406249850988388,
1107
+ "act_scale_shape": [],
1108
+ "act_zp": 0.0,
1109
+ "act_zp_shape": [],
1110
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1111
+ },
1112
+ "single_transformer_blocks.7.attn.out_k": {
1113
+ "act_scale": 0.0611979179084301,
1114
+ "act_scale_shape": [],
1115
+ "act_zp": 0.0,
1116
+ "act_zp_shape": [],
1117
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1118
+ },
1119
+ "single_transformer_blocks.7.attn.out_v": {
1120
+ "act_scale": 0.04270833358168602,
1121
+ "act_scale_shape": [],
1122
+ "act_zp": 0.0,
1123
+ "act_zp_shape": [],
1124
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1125
+ },
1126
+ "single_transformer_blocks.8.norm.linear": {},
1127
+ "single_transformer_blocks.8.proj_mlp": {},
1128
+ "single_transformer_blocks.8.proj_out": {},
1129
+ "single_transformer_blocks.8.attn.to_q": {},
1130
+ "single_transformer_blocks.8.attn.to_k": {},
1131
+ "single_transformer_blocks.8.attn.to_v": {},
1132
+ "single_transformer_blocks.8.attn.to_qkv": {},
1133
+ "single_transformer_blocks.8.attn.output_softmax_quant": {
1134
+ "act_scale": 0.0037760415580123663,
1135
+ "act_scale_shape": [],
1136
+ "act_zp": 0.0,
1137
+ "act_zp_shape": [],
1138
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1139
+ },
1140
+ "single_transformer_blocks.8.attn.out_q": {
1141
+ "act_scale": 0.08958332985639572,
1142
+ "act_scale_shape": [],
1143
+ "act_zp": 0.0,
1144
+ "act_zp_shape": [],
1145
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1146
+ },
1147
+ "single_transformer_blocks.8.attn.out_k": {
1148
+ "act_scale": 0.07916666567325592,
1149
+ "act_scale_shape": [],
1150
+ "act_zp": 0.0,
1151
+ "act_zp_shape": [],
1152
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1153
+ },
1154
+ "single_transformer_blocks.8.attn.out_v": {
1155
+ "act_scale": 0.0416666679084301,
1156
+ "act_scale_shape": [],
1157
+ "act_zp": 0.0,
1158
+ "act_zp_shape": [],
1159
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1160
+ },
1161
+ "single_transformer_blocks.9.norm.linear": {},
1162
+ "single_transformer_blocks.9.proj_mlp": {},
1163
+ "single_transformer_blocks.9.proj_out": {},
1164
+ "single_transformer_blocks.9.attn.to_q": {},
1165
+ "single_transformer_blocks.9.attn.to_k": {},
1166
+ "single_transformer_blocks.9.attn.to_v": {},
1167
+ "single_transformer_blocks.9.attn.to_qkv": {},
1168
+ "single_transformer_blocks.9.attn.output_softmax_quant": {
1169
+ "act_scale": 0.0038574219215661287,
1170
+ "act_scale_shape": [],
1171
+ "act_zp": 0.0,
1172
+ "act_zp_shape": [],
1173
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1174
+ },
1175
+ "single_transformer_blocks.9.attn.out_q": {
1176
+ "act_scale": 0.06666667014360428,
1177
+ "act_scale_shape": [],
1178
+ "act_zp": 0.0,
1179
+ "act_zp_shape": [],
1180
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1181
+ },
1182
+ "single_transformer_blocks.9.attn.out_k": {
1183
+ "act_scale": 0.06197916716337204,
1184
+ "act_scale_shape": [],
1185
+ "act_zp": 0.0,
1186
+ "act_zp_shape": [],
1187
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1188
+ },
1189
+ "single_transformer_blocks.9.attn.out_v": {
1190
+ "act_scale": 0.04062499850988388,
1191
+ "act_scale_shape": [],
1192
+ "act_zp": 0.0,
1193
+ "act_zp_shape": [],
1194
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1195
+ },
1196
+ "single_transformer_blocks.10.norm.linear": {},
1197
+ "single_transformer_blocks.10.proj_mlp": {},
1198
+ "single_transformer_blocks.10.proj_out": {},
1199
+ "single_transformer_blocks.10.attn.to_q": {},
1200
+ "single_transformer_blocks.10.attn.to_k": {},
1201
+ "single_transformer_blocks.10.attn.to_v": {},
1202
+ "single_transformer_blocks.10.attn.to_qkv": {},
1203
+ "single_transformer_blocks.10.attn.output_softmax_quant": {
1204
+ "act_scale": 0.003922526258975267,
1205
+ "act_scale_shape": [],
1206
+ "act_zp": 0.0,
1207
+ "act_zp_shape": [],
1208
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1209
+ },
1210
+ "single_transformer_blocks.10.attn.out_q": {
1211
+ "act_scale": 0.08645833283662796,
1212
+ "act_scale_shape": [],
1213
+ "act_zp": 0.0,
1214
+ "act_zp_shape": [],
1215
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1216
+ },
1217
+ "single_transformer_blocks.10.attn.out_k": {
1218
+ "act_scale": 0.0572916679084301,
1219
+ "act_scale_shape": [],
1220
+ "act_zp": 0.0,
1221
+ "act_zp_shape": [],
1222
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1223
+ },
1224
+ "single_transformer_blocks.10.attn.out_v": {
1225
+ "act_scale": 0.04062499850988388,
1226
+ "act_scale_shape": [],
1227
+ "act_zp": 0.0,
1228
+ "act_zp_shape": [],
1229
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1230
+ },
1231
+ "single_transformer_blocks.11.norm.linear": {},
1232
+ "single_transformer_blocks.11.proj_mlp": {},
1233
+ "single_transformer_blocks.11.proj_out": {},
1234
+ "single_transformer_blocks.11.attn.to_q": {},
1235
+ "single_transformer_blocks.11.attn.to_k": {},
1236
+ "single_transformer_blocks.11.attn.to_v": {},
1237
+ "single_transformer_blocks.11.attn.to_qkv": {},
1238
+ "single_transformer_blocks.11.attn.output_softmax_quant": {
1239
+ "act_scale": 0.00402018241584301,
1240
+ "act_scale_shape": [],
1241
+ "act_zp": 0.0,
1242
+ "act_zp_shape": [],
1243
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1244
+ },
1245
+ "single_transformer_blocks.11.attn.out_q": {
1246
+ "act_scale": 0.10000000149011612,
1247
+ "act_scale_shape": [],
1248
+ "act_zp": 0.0,
1249
+ "act_zp_shape": [],
1250
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1251
+ },
1252
+ "single_transformer_blocks.11.attn.out_k": {
1253
+ "act_scale": 0.06666667014360428,
1254
+ "act_scale_shape": [],
1255
+ "act_zp": 0.0,
1256
+ "act_zp_shape": [],
1257
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1258
+ },
1259
+ "single_transformer_blocks.11.attn.out_v": {
1260
+ "act_scale": 0.04088541492819786,
1261
+ "act_scale_shape": [],
1262
+ "act_zp": 0.0,
1263
+ "act_zp_shape": [],
1264
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1265
+ },
1266
+ "single_transformer_blocks.12.norm.linear": {},
1267
+ "single_transformer_blocks.12.proj_mlp": {},
1268
+ "single_transformer_blocks.12.proj_out": {},
1269
+ "single_transformer_blocks.12.attn.to_q": {},
1270
+ "single_transformer_blocks.12.attn.to_k": {},
1271
+ "single_transformer_blocks.12.attn.to_v": {},
1272
+ "single_transformer_blocks.12.attn.to_qkv": {},
1273
+ "single_transformer_blocks.12.attn.output_softmax_quant": {
1274
+ "act_scale": 0.0038085938431322575,
1275
+ "act_scale_shape": [],
1276
+ "act_zp": 0.0,
1277
+ "act_zp_shape": [],
1278
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1279
+ },
1280
+ "single_transformer_blocks.12.attn.out_q": {
1281
+ "act_scale": 0.06666667014360428,
1282
+ "act_scale_shape": [],
1283
+ "act_zp": 0.0,
1284
+ "act_zp_shape": [],
1285
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1286
+ },
1287
+ "single_transformer_blocks.12.attn.out_k": {
1288
+ "act_scale": 0.06458333134651184,
1289
+ "act_scale_shape": [],
1290
+ "act_zp": 0.0,
1291
+ "act_zp_shape": [],
1292
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1293
+ },
1294
+ "single_transformer_blocks.12.attn.out_v": {
1295
+ "act_scale": 0.03333333507180214,
1296
+ "act_scale_shape": [],
1297
+ "act_zp": 0.0,
1298
+ "act_zp_shape": [],
1299
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1300
+ },
1301
+ "single_transformer_blocks.13.norm.linear": {},
1302
+ "single_transformer_blocks.13.proj_mlp": {},
1303
+ "single_transformer_blocks.13.proj_out": {},
1304
+ "single_transformer_blocks.13.attn.to_q": {},
1305
+ "single_transformer_blocks.13.attn.to_k": {},
1306
+ "single_transformer_blocks.13.attn.to_v": {},
1307
+ "single_transformer_blocks.13.attn.to_qkv": {},
1308
+ "single_transformer_blocks.13.attn.output_softmax_quant": {
1309
+ "act_scale": 0.0039876303635537624,
1310
+ "act_scale_shape": [],
1311
+ "act_zp": 0.0,
1312
+ "act_zp_shape": [],
1313
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1314
+ },
1315
+ "single_transformer_blocks.13.attn.out_q": {
1316
+ "act_scale": 0.10520832985639572,
1317
+ "act_scale_shape": [],
1318
+ "act_zp": 0.0,
1319
+ "act_zp_shape": [],
1320
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1321
+ },
1322
+ "single_transformer_blocks.13.attn.out_k": {
1323
+ "act_scale": 0.05703125149011612,
1324
+ "act_scale_shape": [],
1325
+ "act_zp": 0.0,
1326
+ "act_zp_shape": [],
1327
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1328
+ },
1329
+ "single_transformer_blocks.13.attn.out_v": {
1330
+ "act_scale": 0.04218750074505806,
1331
+ "act_scale_shape": [],
1332
+ "act_zp": 0.0,
1333
+ "act_zp_shape": [],
1334
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1335
+ },
1336
+ "single_transformer_blocks.14.norm.linear": {},
1337
+ "single_transformer_blocks.14.proj_mlp": {},
1338
+ "single_transformer_blocks.14.proj_out": {},
1339
+ "single_transformer_blocks.14.attn.to_q": {},
1340
+ "single_transformer_blocks.14.attn.to_k": {},
1341
+ "single_transformer_blocks.14.attn.to_v": {},
1342
+ "single_transformer_blocks.14.attn.to_qkv": {},
1343
+ "single_transformer_blocks.14.attn.output_softmax_quant": {
1344
+ "act_scale": 0.0040690102614462376,
1345
+ "act_scale_shape": [],
1346
+ "act_zp": 0.0,
1347
+ "act_zp_shape": [],
1348
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1349
+ },
1350
+ "single_transformer_blocks.14.attn.out_q": {
1351
+ "act_scale": 0.06666667014360428,
1352
+ "act_scale_shape": [],
1353
+ "act_zp": 0.0,
1354
+ "act_zp_shape": [],
1355
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1356
+ },
1357
+ "single_transformer_blocks.14.attn.out_k": {
1358
+ "act_scale": 0.11302082985639572,
1359
+ "act_scale_shape": [],
1360
+ "act_zp": 0.0,
1361
+ "act_zp_shape": [],
1362
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1363
+ },
1364
+ "single_transformer_blocks.14.attn.out_v": {
1365
+ "act_scale": 0.0377604179084301,
1366
+ "act_scale_shape": [],
1367
+ "act_zp": 0.0,
1368
+ "act_zp_shape": [],
1369
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1370
+ },
1371
+ "single_transformer_blocks.15.norm.linear": {},
1372
+ "single_transformer_blocks.15.proj_mlp": {},
1373
+ "single_transformer_blocks.15.proj_out": {},
1374
+ "single_transformer_blocks.15.attn.to_q": {},
1375
+ "single_transformer_blocks.15.attn.to_k": {},
1376
+ "single_transformer_blocks.15.attn.to_v": {},
1377
+ "single_transformer_blocks.15.attn.to_qkv": {},
1378
+ "single_transformer_blocks.15.attn.output_softmax_quant": {
1379
+ "act_scale": 0.00390625,
1380
+ "act_scale_shape": [],
1381
+ "act_zp": 0.0,
1382
+ "act_zp_shape": [],
1383
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1384
+ },
1385
+ "single_transformer_blocks.15.attn.out_q": {
1386
+ "act_scale": 0.08802083134651184,
1387
+ "act_scale_shape": [],
1388
+ "act_zp": 0.0,
1389
+ "act_zp_shape": [],
1390
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1391
+ },
1392
+ "single_transformer_blocks.15.attn.out_k": {
1393
+ "act_scale": 0.08906249701976776,
1394
+ "act_scale_shape": [],
1395
+ "act_zp": 0.0,
1396
+ "act_zp_shape": [],
1397
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1398
+ },
1399
+ "single_transformer_blocks.15.attn.out_v": {
1400
+ "act_scale": 0.04791666567325592,
1401
+ "act_scale_shape": [],
1402
+ "act_zp": 0.0,
1403
+ "act_zp_shape": [],
1404
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1405
+ },
1406
+ "single_transformer_blocks.16.norm.linear": {},
1407
+ "single_transformer_blocks.16.proj_mlp": {},
1408
+ "single_transformer_blocks.16.proj_out": {},
1409
+ "single_transformer_blocks.16.attn.to_q": {},
1410
+ "single_transformer_blocks.16.attn.to_k": {},
1411
+ "single_transformer_blocks.16.attn.to_v": {},
1412
+ "single_transformer_blocks.16.attn.to_qkv": {},
1413
+ "single_transformer_blocks.16.attn.output_softmax_quant": {
1414
+ "act_scale": 0.003971354104578495,
1415
+ "act_scale_shape": [],
1416
+ "act_zp": 0.0,
1417
+ "act_zp_shape": [],
1418
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1419
+ },
1420
+ "single_transformer_blocks.16.attn.out_q": {
1421
+ "act_scale": 0.09531249850988388,
1422
+ "act_scale_shape": [],
1423
+ "act_zp": 0.0,
1424
+ "act_zp_shape": [],
1425
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1426
+ },
1427
+ "single_transformer_blocks.16.attn.out_k": {
1428
+ "act_scale": 0.09375,
1429
+ "act_scale_shape": [],
1430
+ "act_zp": 0.0,
1431
+ "act_zp_shape": [],
1432
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1433
+ },
1434
+ "single_transformer_blocks.16.attn.out_v": {
1435
+ "act_scale": 0.05078125,
1436
+ "act_scale_shape": [],
1437
+ "act_zp": 0.0,
1438
+ "act_zp_shape": [],
1439
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1440
+ },
1441
+ "single_transformer_blocks.17.norm.linear": {},
1442
+ "single_transformer_blocks.17.proj_mlp": {},
1443
+ "single_transformer_blocks.17.proj_out": {},
1444
+ "single_transformer_blocks.17.attn.to_q": {},
1445
+ "single_transformer_blocks.17.attn.to_k": {},
1446
+ "single_transformer_blocks.17.attn.to_v": {},
1447
+ "single_transformer_blocks.17.attn.to_qkv": {},
1448
+ "single_transformer_blocks.17.attn.output_softmax_quant": {
1449
+ "act_scale": 0.004166666883975267,
1450
+ "act_scale_shape": [],
1451
+ "act_zp": 0.0,
1452
+ "act_zp_shape": [],
1453
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1454
+ },
1455
+ "single_transformer_blocks.17.attn.out_q": {
1456
+ "act_scale": 0.0963541641831398,
1457
+ "act_scale_shape": [],
1458
+ "act_zp": 0.0,
1459
+ "act_zp_shape": [],
1460
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1461
+ },
1462
+ "single_transformer_blocks.17.attn.out_k": {
1463
+ "act_scale": 0.10572917014360428,
1464
+ "act_scale_shape": [],
1465
+ "act_zp": 0.0,
1466
+ "act_zp_shape": [],
1467
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1468
+ },
1469
+ "single_transformer_blocks.17.attn.out_v": {
1470
+ "act_scale": 0.05416666716337204,
1471
+ "act_scale_shape": [],
1472
+ "act_zp": 0.0,
1473
+ "act_zp_shape": [],
1474
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1475
+ },
1476
+ "single_transformer_blocks.18.norm.linear": {},
1477
+ "single_transformer_blocks.18.proj_mlp": {},
1478
+ "single_transformer_blocks.18.proj_out": {},
1479
+ "single_transformer_blocks.18.attn.to_q": {},
1480
+ "single_transformer_blocks.18.attn.to_k": {},
1481
+ "single_transformer_blocks.18.attn.to_v": {},
1482
+ "single_transformer_blocks.18.attn.to_qkv": {},
1483
+ "single_transformer_blocks.18.attn.output_softmax_quant": {
1484
+ "act_scale": 0.003922526258975267,
1485
+ "act_scale_shape": [],
1486
+ "act_zp": 0.0,
1487
+ "act_zp_shape": [],
1488
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1489
+ },
1490
+ "single_transformer_blocks.18.attn.out_q": {
1491
+ "act_scale": 0.06666667014360428,
1492
+ "act_scale_shape": [],
1493
+ "act_zp": 0.0,
1494
+ "act_zp_shape": [],
1495
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1496
+ },
1497
+ "single_transformer_blocks.18.attn.out_k": {
1498
+ "act_scale": 0.12343750149011612,
1499
+ "act_scale_shape": [],
1500
+ "act_zp": 0.0,
1501
+ "act_zp_shape": [],
1502
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1503
+ },
1504
+ "single_transformer_blocks.18.attn.out_v": {
1505
+ "act_scale": 0.04635416716337204,
1506
+ "act_scale_shape": [],
1507
+ "act_zp": 0.0,
1508
+ "act_zp_shape": [],
1509
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1510
+ },
1511
+ "single_transformer_blocks.19.norm.linear": {},
1512
+ "single_transformer_blocks.19.proj_mlp": {},
1513
+ "single_transformer_blocks.19.proj_out": {},
1514
+ "single_transformer_blocks.19.attn.to_q": {},
1515
+ "single_transformer_blocks.19.attn.to_k": {},
1516
+ "single_transformer_blocks.19.attn.to_v": {},
1517
+ "single_transformer_blocks.19.attn.to_qkv": {},
1518
+ "single_transformer_blocks.19.attn.output_softmax_quant": {
1519
+ "act_scale": 0.00403645820915699,
1520
+ "act_scale_shape": [],
1521
+ "act_zp": 0.0,
1522
+ "act_zp_shape": [],
1523
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1524
+ },
1525
+ "single_transformer_blocks.19.attn.out_q": {
1526
+ "act_scale": 0.06666667014360428,
1527
+ "act_scale_shape": [],
1528
+ "act_zp": 0.0,
1529
+ "act_zp_shape": [],
1530
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1531
+ },
1532
+ "single_transformer_blocks.19.attn.out_k": {
1533
+ "act_scale": 0.07968749850988388,
1534
+ "act_scale_shape": [],
1535
+ "act_zp": 0.0,
1536
+ "act_zp_shape": [],
1537
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1538
+ },
1539
+ "single_transformer_blocks.19.attn.out_v": {
1540
+ "act_scale": 0.03333333507180214,
1541
+ "act_scale_shape": [],
1542
+ "act_zp": 0.0,
1543
+ "act_zp_shape": [],
1544
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1545
+ },
1546
+ "single_transformer_blocks.20.norm.linear": {},
1547
+ "single_transformer_blocks.20.proj_mlp": {},
1548
+ "single_transformer_blocks.20.proj_out": {},
1549
+ "single_transformer_blocks.20.attn.to_q": {},
1550
+ "single_transformer_blocks.20.attn.to_k": {},
1551
+ "single_transformer_blocks.20.attn.to_v": {},
1552
+ "single_transformer_blocks.20.attn.to_qkv": {},
1553
+ "single_transformer_blocks.20.attn.output_softmax_quant": {
1554
+ "act_scale": 0.004134114366024733,
1555
+ "act_scale_shape": [],
1556
+ "act_zp": 0.0,
1557
+ "act_zp_shape": [],
1558
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1559
+ },
1560
+ "single_transformer_blocks.20.attn.out_q": {
1561
+ "act_scale": 0.08385416865348816,
1562
+ "act_scale_shape": [],
1563
+ "act_zp": 0.0,
1564
+ "act_zp_shape": [],
1565
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1566
+ },
1567
+ "single_transformer_blocks.20.attn.out_k": {
1568
+ "act_scale": 0.0885416641831398,
1569
+ "act_scale_shape": [],
1570
+ "act_zp": 0.0,
1571
+ "act_zp_shape": [],
1572
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1573
+ },
1574
+ "single_transformer_blocks.20.attn.out_v": {
1575
+ "act_scale": 0.04661458358168602,
1576
+ "act_scale_shape": [],
1577
+ "act_zp": 0.0,
1578
+ "act_zp_shape": [],
1579
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1580
+ },
1581
+ "single_transformer_blocks.21.norm.linear": {},
1582
+ "single_transformer_blocks.21.proj_mlp": {},
1583
+ "single_transformer_blocks.21.proj_out": {},
1584
+ "single_transformer_blocks.21.attn.to_q": {},
1585
+ "single_transformer_blocks.21.attn.to_k": {},
1586
+ "single_transformer_blocks.21.attn.to_v": {},
1587
+ "single_transformer_blocks.21.attn.to_qkv": {},
1588
+ "single_transformer_blocks.21.attn.output_softmax_quant": {
1589
+ "act_scale": 0.0037760415580123663,
1590
+ "act_scale_shape": [],
1591
+ "act_zp": 0.0,
1592
+ "act_zp_shape": [],
1593
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1594
+ },
1595
+ "single_transformer_blocks.21.attn.out_q": {
1596
+ "act_scale": 0.06666667014360428,
1597
+ "act_scale_shape": [],
1598
+ "act_zp": 0.0,
1599
+ "act_zp_shape": [],
1600
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1601
+ },
1602
+ "single_transformer_blocks.21.attn.out_k": {
1603
+ "act_scale": 0.06562499701976776,
1604
+ "act_scale_shape": [],
1605
+ "act_zp": 0.0,
1606
+ "act_zp_shape": [],
1607
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1608
+ },
1609
+ "single_transformer_blocks.21.attn.out_v": {
1610
+ "act_scale": 0.03333333507180214,
1611
+ "act_scale_shape": [],
1612
+ "act_zp": 0.0,
1613
+ "act_zp_shape": [],
1614
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1615
+ },
1616
+ "single_transformer_blocks.22.norm.linear": {},
1617
+ "single_transformer_blocks.22.proj_mlp": {},
1618
+ "single_transformer_blocks.22.proj_out": {},
1619
+ "single_transformer_blocks.22.attn.to_q": {},
1620
+ "single_transformer_blocks.22.attn.to_k": {},
1621
+ "single_transformer_blocks.22.attn.to_v": {},
1622
+ "single_transformer_blocks.22.attn.to_qkv": {},
1623
+ "single_transformer_blocks.22.attn.output_softmax_quant": {
1624
+ "act_scale": 0.0040527344681322575,
1625
+ "act_scale_shape": [],
1626
+ "act_zp": 0.0,
1627
+ "act_zp_shape": [],
1628
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1629
+ },
1630
+ "single_transformer_blocks.22.attn.out_q": {
1631
+ "act_scale": 0.07916666567325592,
1632
+ "act_scale_shape": [],
1633
+ "act_zp": 0.0,
1634
+ "act_zp_shape": [],
1635
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1636
+ },
1637
+ "single_transformer_blocks.22.attn.out_k": {
1638
+ "act_scale": 0.07864583283662796,
1639
+ "act_scale_shape": [],
1640
+ "act_zp": 0.0,
1641
+ "act_zp_shape": [],
1642
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1643
+ },
1644
+ "single_transformer_blocks.22.attn.out_v": {
1645
+ "act_scale": 0.03333333507180214,
1646
+ "act_scale_shape": [],
1647
+ "act_zp": 0.0,
1648
+ "act_zp_shape": [],
1649
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1650
+ },
1651
+ "single_transformer_blocks.23.norm.linear": {},
1652
+ "single_transformer_blocks.23.proj_mlp": {},
1653
+ "single_transformer_blocks.23.proj_out": {},
1654
+ "single_transformer_blocks.23.attn.to_q": {},
1655
+ "single_transformer_blocks.23.attn.to_k": {},
1656
+ "single_transformer_blocks.23.attn.to_v": {},
1657
+ "single_transformer_blocks.23.attn.to_qkv": {},
1658
+ "single_transformer_blocks.23.attn.output_softmax_quant": {
1659
+ "act_scale": 0.0040527344681322575,
1660
+ "act_scale_shape": [],
1661
+ "act_zp": 0.0,
1662
+ "act_zp_shape": [],
1663
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1664
+ },
1665
+ "single_transformer_blocks.23.attn.out_q": {
1666
+ "act_scale": 0.09218750149011612,
1667
+ "act_scale_shape": [],
1668
+ "act_zp": 0.0,
1669
+ "act_zp_shape": [],
1670
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1671
+ },
1672
+ "single_transformer_blocks.23.attn.out_k": {
1673
+ "act_scale": 0.07968749850988388,
1674
+ "act_scale_shape": [],
1675
+ "act_zp": 0.0,
1676
+ "act_zp_shape": [],
1677
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1678
+ },
1679
+ "single_transformer_blocks.23.attn.out_v": {
1680
+ "act_scale": 0.03333333507180214,
1681
+ "act_scale_shape": [],
1682
+ "act_zp": 0.0,
1683
+ "act_zp_shape": [],
1684
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1685
+ },
1686
+ "single_transformer_blocks.24.norm.linear": {},
1687
+ "single_transformer_blocks.24.proj_mlp": {},
1688
+ "single_transformer_blocks.24.proj_out": {},
1689
+ "single_transformer_blocks.24.attn.to_q": {},
1690
+ "single_transformer_blocks.24.attn.to_k": {},
1691
+ "single_transformer_blocks.24.attn.to_v": {},
1692
+ "single_transformer_blocks.24.attn.to_qkv": {},
1693
+ "single_transformer_blocks.24.attn.output_softmax_quant": {
1694
+ "act_scale": 0.003922526258975267,
1695
+ "act_scale_shape": [],
1696
+ "act_zp": 0.0,
1697
+ "act_zp_shape": [],
1698
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1699
+ },
1700
+ "single_transformer_blocks.24.attn.out_q": {
1701
+ "act_scale": 0.06666667014360428,
1702
+ "act_scale_shape": [],
1703
+ "act_zp": 0.0,
1704
+ "act_zp_shape": [],
1705
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1706
+ },
1707
+ "single_transformer_blocks.24.attn.out_k": {
1708
+ "act_scale": 0.05781250074505806,
1709
+ "act_scale_shape": [],
1710
+ "act_zp": 0.0,
1711
+ "act_zp_shape": [],
1712
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1713
+ },
1714
+ "single_transformer_blocks.24.attn.out_v": {
1715
+ "act_scale": 0.03333333507180214,
1716
+ "act_scale_shape": [],
1717
+ "act_zp": 0.0,
1718
+ "act_zp_shape": [],
1719
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1720
+ },
1721
+ "single_transformer_blocks.25.norm.linear": {},
1722
+ "single_transformer_blocks.25.proj_mlp": {},
1723
+ "single_transformer_blocks.25.proj_out": {},
1724
+ "single_transformer_blocks.25.attn.to_q": {},
1725
+ "single_transformer_blocks.25.attn.to_k": {},
1726
+ "single_transformer_blocks.25.attn.to_v": {},
1727
+ "single_transformer_blocks.25.attn.to_qkv": {},
1728
+ "single_transformer_blocks.25.attn.output_softmax_quant": {
1729
+ "act_scale": 0.0040690102614462376,
1730
+ "act_scale_shape": [],
1731
+ "act_zp": 0.0,
1732
+ "act_zp_shape": [],
1733
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1734
+ },
1735
+ "single_transformer_blocks.25.attn.out_q": {
1736
+ "act_scale": 0.08802083134651184,
1737
+ "act_scale_shape": [],
1738
+ "act_zp": 0.0,
1739
+ "act_zp_shape": [],
1740
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1741
+ },
1742
+ "single_transformer_blocks.25.attn.out_k": {
1743
+ "act_scale": 0.07604166865348816,
1744
+ "act_scale_shape": [],
1745
+ "act_zp": 0.0,
1746
+ "act_zp_shape": [],
1747
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1748
+ },
1749
+ "single_transformer_blocks.25.attn.out_v": {
1750
+ "act_scale": 0.03333333507180214,
1751
+ "act_scale_shape": [],
1752
+ "act_zp": 0.0,
1753
+ "act_zp_shape": [],
1754
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1755
+ },
1756
+ "single_transformer_blocks.26.norm.linear": {},
1757
+ "single_transformer_blocks.26.proj_mlp": {},
1758
+ "single_transformer_blocks.26.proj_out": {},
1759
+ "single_transformer_blocks.26.attn.to_q": {},
1760
+ "single_transformer_blocks.26.attn.to_k": {},
1761
+ "single_transformer_blocks.26.attn.to_v": {},
1762
+ "single_transformer_blocks.26.attn.to_qkv": {},
1763
+ "single_transformer_blocks.26.attn.output_softmax_quant": {
1764
+ "act_scale": 0.0040039061568677425,
1765
+ "act_scale_shape": [],
1766
+ "act_zp": 0.0,
1767
+ "act_zp_shape": [],
1768
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1769
+ },
1770
+ "single_transformer_blocks.26.attn.out_q": {
1771
+ "act_scale": 0.08124999701976776,
1772
+ "act_scale_shape": [],
1773
+ "act_zp": 0.0,
1774
+ "act_zp_shape": [],
1775
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1776
+ },
1777
+ "single_transformer_blocks.26.attn.out_k": {
1778
+ "act_scale": 0.07760416716337204,
1779
+ "act_scale_shape": [],
1780
+ "act_zp": 0.0,
1781
+ "act_zp_shape": [],
1782
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1783
+ },
1784
+ "single_transformer_blocks.26.attn.out_v": {
1785
+ "act_scale": 0.03333333507180214,
1786
+ "act_scale_shape": [],
1787
+ "act_zp": 0.0,
1788
+ "act_zp_shape": [],
1789
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1790
+ },
1791
+ "single_transformer_blocks.27.norm.linear": {},
1792
+ "single_transformer_blocks.27.proj_mlp": {},
1793
+ "single_transformer_blocks.27.proj_out": {},
1794
+ "single_transformer_blocks.27.attn.to_q": {},
1795
+ "single_transformer_blocks.27.attn.to_k": {},
1796
+ "single_transformer_blocks.27.attn.to_v": {},
1797
+ "single_transformer_blocks.27.attn.to_qkv": {},
1798
+ "single_transformer_blocks.27.attn.output_softmax_quant": {
1799
+ "act_scale": 0.004101562313735485,
1800
+ "act_scale_shape": [],
1801
+ "act_zp": 0.0,
1802
+ "act_zp_shape": [],
1803
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1804
+ },
1805
+ "single_transformer_blocks.27.attn.out_q": {
1806
+ "act_scale": 0.0859375,
1807
+ "act_scale_shape": [],
1808
+ "act_zp": 0.0,
1809
+ "act_zp_shape": [],
1810
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1811
+ },
1812
+ "single_transformer_blocks.27.attn.out_k": {
1813
+ "act_scale": 0.09375,
1814
+ "act_scale_shape": [],
1815
+ "act_zp": 0.0,
1816
+ "act_zp_shape": [],
1817
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1818
+ },
1819
+ "single_transformer_blocks.27.attn.out_v": {
1820
+ "act_scale": 0.04114583507180214,
1821
+ "act_scale_shape": [],
1822
+ "act_zp": 0.0,
1823
+ "act_zp_shape": [],
1824
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1825
+ },
1826
+ "single_transformer_blocks.28.norm.linear": {},
1827
+ "single_transformer_blocks.28.proj_mlp": {},
1828
+ "single_transformer_blocks.28.proj_out": {},
1829
+ "single_transformer_blocks.28.attn.to_q": {},
1830
+ "single_transformer_blocks.28.attn.to_k": {},
1831
+ "single_transformer_blocks.28.attn.to_v": {},
1832
+ "single_transformer_blocks.28.attn.to_qkv": {},
1833
+ "single_transformer_blocks.28.attn.output_softmax_quant": {
1834
+ "act_scale": 0.0040690102614462376,
1835
+ "act_scale_shape": [],
1836
+ "act_zp": 0.0,
1837
+ "act_zp_shape": [],
1838
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1839
+ },
1840
+ "single_transformer_blocks.28.attn.out_q": {
1841
+ "act_scale": 0.09427083283662796,
1842
+ "act_scale_shape": [],
1843
+ "act_zp": 0.0,
1844
+ "act_zp_shape": [],
1845
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1846
+ },
1847
+ "single_transformer_blocks.28.attn.out_k": {
1848
+ "act_scale": 0.109375,
1849
+ "act_scale_shape": [],
1850
+ "act_zp": 0.0,
1851
+ "act_zp_shape": [],
1852
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1853
+ },
1854
+ "single_transformer_blocks.28.attn.out_v": {
1855
+ "act_scale": 0.03333333507180214,
1856
+ "act_scale_shape": [],
1857
+ "act_zp": 0.0,
1858
+ "act_zp_shape": [],
1859
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1860
+ },
1861
+ "single_transformer_blocks.29.norm.linear": {},
1862
+ "single_transformer_blocks.29.proj_mlp": {},
1863
+ "single_transformer_blocks.29.proj_out": {},
1864
+ "single_transformer_blocks.29.attn.to_q": {},
1865
+ "single_transformer_blocks.29.attn.to_k": {},
1866
+ "single_transformer_blocks.29.attn.to_v": {},
1867
+ "single_transformer_blocks.29.attn.to_qkv": {},
1868
+ "single_transformer_blocks.29.attn.output_softmax_quant": {
1869
+ "act_scale": 0.004134114366024733,
1870
+ "act_scale_shape": [],
1871
+ "act_zp": 0.0,
1872
+ "act_zp_shape": [],
1873
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1874
+ },
1875
+ "single_transformer_blocks.29.attn.out_q": {
1876
+ "act_scale": 0.078125,
1877
+ "act_scale_shape": [],
1878
+ "act_zp": 0.0,
1879
+ "act_zp_shape": [],
1880
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1881
+ },
1882
+ "single_transformer_blocks.29.attn.out_k": {
1883
+ "act_scale": 0.06666667014360428,
1884
+ "act_scale_shape": [],
1885
+ "act_zp": 0.0,
1886
+ "act_zp_shape": [],
1887
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1888
+ },
1889
+ "single_transformer_blocks.29.attn.out_v": {
1890
+ "act_scale": 0.03984374925494194,
1891
+ "act_scale_shape": [],
1892
+ "act_zp": 0.0,
1893
+ "act_zp_shape": [],
1894
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1895
+ },
1896
+ "single_transformer_blocks.30.norm.linear": {},
1897
+ "single_transformer_blocks.30.proj_mlp": {},
1898
+ "single_transformer_blocks.30.proj_out": {},
1899
+ "single_transformer_blocks.30.attn.to_q": {},
1900
+ "single_transformer_blocks.30.attn.to_k": {},
1901
+ "single_transformer_blocks.30.attn.to_v": {},
1902
+ "single_transformer_blocks.30.attn.to_qkv": {},
1903
+ "single_transformer_blocks.30.attn.output_softmax_quant": {
1904
+ "act_scale": 0.004134114366024733,
1905
+ "act_scale_shape": [],
1906
+ "act_zp": 0.0,
1907
+ "act_zp_shape": [],
1908
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1909
+ },
1910
+ "single_transformer_blocks.30.attn.out_q": {
1911
+ "act_scale": 0.06666667014360428,
1912
+ "act_scale_shape": [],
1913
+ "act_zp": 0.0,
1914
+ "act_zp_shape": [],
1915
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1916
+ },
1917
+ "single_transformer_blocks.30.attn.out_k": {
1918
+ "act_scale": 0.06666667014360428,
1919
+ "act_scale_shape": [],
1920
+ "act_zp": 0.0,
1921
+ "act_zp_shape": [],
1922
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1923
+ },
1924
+ "single_transformer_blocks.30.attn.out_v": {
1925
+ "act_scale": 0.04505208507180214,
1926
+ "act_scale_shape": [],
1927
+ "act_zp": 0.0,
1928
+ "act_zp_shape": [],
1929
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1930
+ },
1931
+ "single_transformer_blocks.31.norm.linear": {},
1932
+ "single_transformer_blocks.31.proj_mlp": {},
1933
+ "single_transformer_blocks.31.proj_out": {},
1934
+ "single_transformer_blocks.31.attn.to_q": {},
1935
+ "single_transformer_blocks.31.attn.to_k": {},
1936
+ "single_transformer_blocks.31.attn.to_v": {},
1937
+ "single_transformer_blocks.31.attn.to_qkv": {},
1938
+ "single_transformer_blocks.31.attn.output_softmax_quant": {
1939
+ "act_scale": 0.004166666883975267,
1940
+ "act_scale_shape": [],
1941
+ "act_zp": 0.0,
1942
+ "act_zp_shape": [],
1943
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1944
+ },
1945
+ "single_transformer_blocks.31.attn.out_q": {
1946
+ "act_scale": 0.0989583358168602,
1947
+ "act_scale_shape": [],
1948
+ "act_zp": 0.0,
1949
+ "act_zp_shape": [],
1950
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1951
+ },
1952
+ "single_transformer_blocks.31.attn.out_k": {
1953
+ "act_scale": 0.06197916716337204,
1954
+ "act_scale_shape": [],
1955
+ "act_zp": 0.0,
1956
+ "act_zp_shape": [],
1957
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1958
+ },
1959
+ "single_transformer_blocks.31.attn.out_v": {
1960
+ "act_scale": 0.046875,
1961
+ "act_scale_shape": [],
1962
+ "act_zp": 0.0,
1963
+ "act_zp_shape": [],
1964
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1965
+ },
1966
+ "single_transformer_blocks.32.norm.linear": {},
1967
+ "single_transformer_blocks.32.proj_mlp": {},
1968
+ "single_transformer_blocks.32.proj_out": {},
1969
+ "single_transformer_blocks.32.attn.to_q": {},
1970
+ "single_transformer_blocks.32.attn.to_k": {},
1971
+ "single_transformer_blocks.32.attn.to_v": {},
1972
+ "single_transformer_blocks.32.attn.to_qkv": {},
1973
+ "single_transformer_blocks.32.attn.output_softmax_quant": {
1974
+ "act_scale": 0.0040690102614462376,
1975
+ "act_scale_shape": [],
1976
+ "act_zp": 0.0,
1977
+ "act_zp_shape": [],
1978
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1979
+ },
1980
+ "single_transformer_blocks.32.attn.out_q": {
1981
+ "act_scale": 0.10312499850988388,
1982
+ "act_scale_shape": [],
1983
+ "act_zp": 0.0,
1984
+ "act_zp_shape": [],
1985
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1986
+ },
1987
+ "single_transformer_blocks.32.attn.out_k": {
1988
+ "act_scale": 0.09270833432674408,
1989
+ "act_scale_shape": [],
1990
+ "act_zp": 0.0,
1991
+ "act_zp_shape": [],
1992
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
1993
+ },
1994
+ "single_transformer_blocks.32.attn.out_v": {
1995
+ "act_scale": 0.05156249925494194,
1996
+ "act_scale_shape": [],
1997
+ "act_zp": 0.0,
1998
+ "act_zp_shape": [],
1999
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2000
+ },
2001
+ "single_transformer_blocks.33.norm.linear": {},
2002
+ "single_transformer_blocks.33.proj_mlp": {},
2003
+ "single_transformer_blocks.33.proj_out": {},
2004
+ "single_transformer_blocks.33.attn.to_q": {},
2005
+ "single_transformer_blocks.33.attn.to_k": {},
2006
+ "single_transformer_blocks.33.attn.to_v": {},
2007
+ "single_transformer_blocks.33.attn.to_qkv": {},
2008
+ "single_transformer_blocks.33.attn.output_softmax_quant": {
2009
+ "act_scale": 0.0040690102614462376,
2010
+ "act_scale_shape": [],
2011
+ "act_zp": 0.0,
2012
+ "act_zp_shape": [],
2013
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2014
+ },
2015
+ "single_transformer_blocks.33.attn.out_q": {
2016
+ "act_scale": 0.08541666716337204,
2017
+ "act_scale_shape": [],
2018
+ "act_zp": 0.0,
2019
+ "act_zp_shape": [],
2020
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2021
+ },
2022
+ "single_transformer_blocks.33.attn.out_k": {
2023
+ "act_scale": 0.06666667014360428,
2024
+ "act_scale_shape": [],
2025
+ "act_zp": 0.0,
2026
+ "act_zp_shape": [],
2027
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2028
+ },
2029
+ "single_transformer_blocks.33.attn.out_v": {
2030
+ "act_scale": 0.04843749850988388,
2031
+ "act_scale_shape": [],
2032
+ "act_zp": 0.0,
2033
+ "act_zp_shape": [],
2034
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2035
+ },
2036
+ "single_transformer_blocks.34.norm.linear": {},
2037
+ "single_transformer_blocks.34.proj_mlp": {},
2038
+ "single_transformer_blocks.34.proj_out": {},
2039
+ "single_transformer_blocks.34.attn.to_q": {},
2040
+ "single_transformer_blocks.34.attn.to_k": {},
2041
+ "single_transformer_blocks.34.attn.to_v": {},
2042
+ "single_transformer_blocks.34.attn.to_qkv": {},
2043
+ "single_transformer_blocks.34.attn.output_softmax_quant": {
2044
+ "act_scale": 0.0040690102614462376,
2045
+ "act_scale_shape": [],
2046
+ "act_zp": 0.0,
2047
+ "act_zp_shape": [],
2048
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2049
+ },
2050
+ "single_transformer_blocks.34.attn.out_q": {
2051
+ "act_scale": 0.09427083283662796,
2052
+ "act_scale_shape": [],
2053
+ "act_zp": 0.0,
2054
+ "act_zp_shape": [],
2055
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2056
+ },
2057
+ "single_transformer_blocks.34.attn.out_k": {
2058
+ "act_scale": 0.12656250596046448,
2059
+ "act_scale_shape": [],
2060
+ "act_zp": 0.0,
2061
+ "act_zp_shape": [],
2062
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2063
+ },
2064
+ "single_transformer_blocks.34.attn.out_v": {
2065
+ "act_scale": 0.05520833283662796,
2066
+ "act_scale_shape": [],
2067
+ "act_zp": 0.0,
2068
+ "act_zp_shape": [],
2069
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2070
+ },
2071
+ "single_transformer_blocks.35.norm.linear": {},
2072
+ "single_transformer_blocks.35.proj_mlp": {},
2073
+ "single_transformer_blocks.35.proj_out": {},
2074
+ "single_transformer_blocks.35.attn.to_q": {},
2075
+ "single_transformer_blocks.35.attn.to_k": {},
2076
+ "single_transformer_blocks.35.attn.to_v": {},
2077
+ "single_transformer_blocks.35.attn.to_qkv": {},
2078
+ "single_transformer_blocks.35.attn.output_softmax_quant": {
2079
+ "act_scale": 0.00403645820915699,
2080
+ "act_scale_shape": [],
2081
+ "act_zp": 0.0,
2082
+ "act_zp_shape": [],
2083
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2084
+ },
2085
+ "single_transformer_blocks.35.attn.out_q": {
2086
+ "act_scale": 0.0989583358168602,
2087
+ "act_scale_shape": [],
2088
+ "act_zp": 0.0,
2089
+ "act_zp_shape": [],
2090
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2091
+ },
2092
+ "single_transformer_blocks.35.attn.out_k": {
2093
+ "act_scale": 0.08020833134651184,
2094
+ "act_scale_shape": [],
2095
+ "act_zp": 0.0,
2096
+ "act_zp_shape": [],
2097
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2098
+ },
2099
+ "single_transformer_blocks.35.attn.out_v": {
2100
+ "act_scale": 0.05624999850988388,
2101
+ "act_scale_shape": [],
2102
+ "act_zp": 0.0,
2103
+ "act_zp_shape": [],
2104
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2105
+ },
2106
+ "single_transformer_blocks.36.norm.linear": {},
2107
+ "single_transformer_blocks.36.proj_mlp": {},
2108
+ "single_transformer_blocks.36.proj_out": {},
2109
+ "single_transformer_blocks.36.attn.to_q": {},
2110
+ "single_transformer_blocks.36.attn.to_k": {},
2111
+ "single_transformer_blocks.36.attn.to_v": {},
2112
+ "single_transformer_blocks.36.attn.to_qkv": {},
2113
+ "single_transformer_blocks.36.attn.output_softmax_quant": {
2114
+ "act_scale": 0.00402018241584301,
2115
+ "act_scale_shape": [],
2116
+ "act_zp": 0.0,
2117
+ "act_zp_shape": [],
2118
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2119
+ },
2120
+ "single_transformer_blocks.36.attn.out_q": {
2121
+ "act_scale": 0.12604166567325592,
2122
+ "act_scale_shape": [],
2123
+ "act_zp": 0.0,
2124
+ "act_zp_shape": [],
2125
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2126
+ },
2127
+ "single_transformer_blocks.36.attn.out_k": {
2128
+ "act_scale": 0.10000000149011612,
2129
+ "act_scale_shape": [],
2130
+ "act_zp": 0.0,
2131
+ "act_zp_shape": [],
2132
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2133
+ },
2134
+ "single_transformer_blocks.36.attn.out_v": {
2135
+ "act_scale": 0.06666667014360428,
2136
+ "act_scale_shape": [],
2137
+ "act_zp": 0.0,
2138
+ "act_zp_shape": [],
2139
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2140
+ },
2141
+ "single_transformer_blocks.37.norm.linear": {},
2142
+ "single_transformer_blocks.37.proj_mlp": {},
2143
+ "single_transformer_blocks.37.proj_out": {},
2144
+ "single_transformer_blocks.37.attn.to_q": {},
2145
+ "single_transformer_blocks.37.attn.to_k": {},
2146
+ "single_transformer_blocks.37.attn.to_v": {},
2147
+ "single_transformer_blocks.37.attn.to_qkv": {},
2148
+ "single_transformer_blocks.37.attn.output_softmax_quant": {
2149
+ "act_scale": 0.0015055338153615594,
2150
+ "act_scale_shape": [],
2151
+ "act_zp": 0.0,
2152
+ "act_zp_shape": [],
2153
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2154
+ },
2155
+ "single_transformer_blocks.37.attn.out_q": {
2156
+ "act_scale": 0.0989583358168602,
2157
+ "act_scale_shape": [],
2158
+ "act_zp": 0.0,
2159
+ "act_zp_shape": [],
2160
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2161
+ },
2162
+ "single_transformer_blocks.37.attn.out_k": {
2163
+ "act_scale": 0.08697916567325592,
2164
+ "act_scale_shape": [],
2165
+ "act_zp": 0.0,
2166
+ "act_zp_shape": [],
2167
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2168
+ },
2169
+ "single_transformer_blocks.37.attn.out_v": {
2170
+ "act_scale": 0.06666667014360428,
2171
+ "act_scale_shape": [],
2172
+ "act_zp": 0.0,
2173
+ "act_zp_shape": [],
2174
+ "act_zp_dtype": "torch.float8_e4m3fnuz"
2175
+ },
2176
+ "norm_out.linear": {},
2177
+ "proj_out": {}
2178
+ }