neginr commited on
Commit
72828bc
·
verified ·
1 Parent(s): 3223b16

End of training

Browse files
Files changed (5) hide show
  1. README.md +2 -1
  2. all_results.json +8 -0
  3. train_results.json +8 -0
  4. trainer_state.json +1057 -0
  5. training_loss.png +0 -0
README.md CHANGED
@@ -4,6 +4,7 @@ license: apache-2.0
4
  base_model: Qwen/Qwen2.5-7B-Instruct
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: no_pipeline_science_100k
@@ -15,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # no_pipeline_science_100k
17
 
18
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on an unknown dataset.
19
 
20
  ## Model description
21
 
 
4
  base_model: Qwen/Qwen2.5-7B-Instruct
5
  tags:
6
  - llama-factory
7
+ - full
8
  - generated_from_trainer
9
  model-index:
10
  - name: no_pipeline_science_100k
 
16
 
17
  # no_pipeline_science_100k
18
 
19
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the mlfoundations-dev/no_pipeline_science_100k dataset.
20
 
21
  ## Model description
22
 
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.937369519832985,
3
+ "total_flos": 3.738141667979428e+18,
4
+ "train_loss": 0.2130410626016814,
5
+ "train_runtime": 6079.0514,
6
+ "train_samples_per_second": 12.591,
7
+ "train_steps_per_second": 0.024
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.937369519832985,
3
+ "total_flos": 3.738141667979428e+18,
4
+ "train_loss": 0.2130410626016814,
5
+ "train_runtime": 6079.0514,
6
+ "train_samples_per_second": 12.591,
7
+ "train_steps_per_second": 0.024
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1057 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 4.937369519832985,
5
+ "eval_steps": 500,
6
+ "global_step": 145,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.033402922755741124,
13
+ "grad_norm": 7.154098089649519,
14
+ "learning_rate": 5.333333333333334e-06,
15
+ "loss": 1.2049,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.06680584551148225,
20
+ "grad_norm": 7.224671814367719,
21
+ "learning_rate": 1.0666666666666667e-05,
22
+ "loss": 1.2046,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.10020876826722339,
27
+ "grad_norm": 5.112010906482035,
28
+ "learning_rate": 1.6000000000000003e-05,
29
+ "loss": 1.1276,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.1336116910229645,
34
+ "grad_norm": 5.240191362417293,
35
+ "learning_rate": 2.1333333333333335e-05,
36
+ "loss": 1.0958,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.16701461377870563,
41
+ "grad_norm": 4.398708169023894,
42
+ "learning_rate": 2.6666666666666667e-05,
43
+ "loss": 1.0242,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.20041753653444677,
48
+ "grad_norm": 4.9473080352678895,
49
+ "learning_rate": 3.2000000000000005e-05,
50
+ "loss": 1.0277,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.23382045929018788,
55
+ "grad_norm": 3.8837845230573835,
56
+ "learning_rate": 3.733333333333334e-05,
57
+ "loss": 0.9755,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.267223382045929,
62
+ "grad_norm": 2.957475416120971,
63
+ "learning_rate": 4.266666666666667e-05,
64
+ "loss": 0.9334,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.30062630480167013,
69
+ "grad_norm": 2.187999537773017,
70
+ "learning_rate": 4.8e-05,
71
+ "loss": 0.9211,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.33402922755741127,
76
+ "grad_norm": 2.3155958654718614,
77
+ "learning_rate": 5.333333333333333e-05,
78
+ "loss": 0.8983,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.3674321503131524,
83
+ "grad_norm": 2.8444701447464436,
84
+ "learning_rate": 5.8666666666666665e-05,
85
+ "loss": 0.8975,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.40083507306889354,
90
+ "grad_norm": 3.0982586639870213,
91
+ "learning_rate": 6.400000000000001e-05,
92
+ "loss": 0.8856,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.4342379958246347,
97
+ "grad_norm": 1.8874954111166966,
98
+ "learning_rate": 6.933333333333334e-05,
99
+ "loss": 0.872,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.46764091858037576,
104
+ "grad_norm": 3.038189077699479,
105
+ "learning_rate": 7.466666666666667e-05,
106
+ "loss": 0.8892,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.5010438413361169,
111
+ "grad_norm": 1.7381979961116139,
112
+ "learning_rate": 8e-05,
113
+ "loss": 0.8535,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.534446764091858,
118
+ "grad_norm": 2188.323143153608,
119
+ "learning_rate": 7.998832056320773e-05,
120
+ "loss": 1.0923,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.5678496868475992,
125
+ "grad_norm": 5.568753243553315,
126
+ "learning_rate": 7.995328907329308e-05,
127
+ "loss": 0.9433,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.6012526096033403,
132
+ "grad_norm": 3.003707178203899,
133
+ "learning_rate": 7.989492598765966e-05,
134
+ "loss": 0.8783,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.6346555323590815,
139
+ "grad_norm": 3.3787729580945367,
140
+ "learning_rate": 7.981326538870596e-05,
141
+ "loss": 0.8657,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.6680584551148225,
146
+ "grad_norm": 3.0096085489081923,
147
+ "learning_rate": 7.970835496392216e-05,
148
+ "loss": 0.8705,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.7014613778705637,
153
+ "grad_norm": 2.2251853165136186,
154
+ "learning_rate": 7.958025597804205e-05,
155
+ "loss": 0.8591,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.7348643006263048,
160
+ "grad_norm": 1.393655838420696,
161
+ "learning_rate": 7.942904323726604e-05,
162
+ "loss": 0.8202,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.7682672233820459,
167
+ "grad_norm": 1.709595774468825,
168
+ "learning_rate": 7.925480504557654e-05,
169
+ "loss": 0.8239,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.8016701461377871,
174
+ "grad_norm": 1.04923819217978,
175
+ "learning_rate": 7.90576431531709e-05,
176
+ "loss": 0.8236,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.8350730688935282,
181
+ "grad_norm": 1.4676306786694173,
182
+ "learning_rate": 7.883767269704209e-05,
183
+ "loss": 0.8083,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.8684759916492694,
188
+ "grad_norm": 417.78005391043996,
189
+ "learning_rate": 7.859502213374207e-05,
190
+ "loss": 1.1719,
191
+ "step": 26
192
+ },
193
+ {
194
+ "epoch": 0.9018789144050104,
195
+ "grad_norm": 12.203128331470229,
196
+ "learning_rate": 7.832983316436666e-05,
197
+ "loss": 0.8597,
198
+ "step": 27
199
+ },
200
+ {
201
+ "epoch": 0.9352818371607515,
202
+ "grad_norm": 4.957392164656034,
203
+ "learning_rate": 7.804226065180615e-05,
204
+ "loss": 0.9382,
205
+ "step": 28
206
+ },
207
+ {
208
+ "epoch": 0.9686847599164927,
209
+ "grad_norm": 10.61998316626802,
210
+ "learning_rate": 7.773247253030973e-05,
211
+ "loss": 0.9939,
212
+ "step": 29
213
+ },
214
+ {
215
+ "epoch": 1.0083507306889352,
216
+ "grad_norm": 54.87377316789775,
217
+ "learning_rate": 7.740064970741661e-05,
218
+ "loss": 0.8724,
219
+ "step": 30
220
+ },
221
+ {
222
+ "epoch": 1.0417536534446765,
223
+ "grad_norm": 88.85368950052036,
224
+ "learning_rate": 7.704698595831107e-05,
225
+ "loss": 0.9805,
226
+ "step": 31
227
+ },
228
+ {
229
+ "epoch": 1.0751565762004176,
230
+ "grad_norm": 12.07917493536492,
231
+ "learning_rate": 7.667168781266331e-05,
232
+ "loss": 0.9689,
233
+ "step": 32
234
+ },
235
+ {
236
+ "epoch": 1.1085594989561587,
237
+ "grad_norm": 30.730151095536375,
238
+ "learning_rate": 7.627497443402182e-05,
239
+ "loss": 1.0908,
240
+ "step": 33
241
+ },
242
+ {
243
+ "epoch": 1.1419624217118998,
244
+ "grad_norm": 4.303465706734318,
245
+ "learning_rate": 7.585707749182816e-05,
246
+ "loss": 0.8883,
247
+ "step": 34
248
+ },
249
+ {
250
+ "epoch": 1.1753653444676408,
251
+ "grad_norm": 1.3597826636785535,
252
+ "learning_rate": 7.541824102612839e-05,
253
+ "loss": 0.8376,
254
+ "step": 35
255
+ },
256
+ {
257
+ "epoch": 1.2087682672233822,
258
+ "grad_norm": 2.028071696746282,
259
+ "learning_rate": 7.495872130506072e-05,
260
+ "loss": 0.8018,
261
+ "step": 36
262
+ },
263
+ {
264
+ "epoch": 1.2421711899791232,
265
+ "grad_norm": 1.5668710744698326,
266
+ "learning_rate": 7.447878667520198e-05,
267
+ "loss": 0.7901,
268
+ "step": 37
269
+ },
270
+ {
271
+ "epoch": 1.2755741127348643,
272
+ "grad_norm": 0.9166659557892114,
273
+ "learning_rate": 7.397871740486085e-05,
274
+ "loss": 0.7699,
275
+ "step": 38
276
+ },
277
+ {
278
+ "epoch": 1.3089770354906054,
279
+ "grad_norm": 8.207728871032339,
280
+ "learning_rate": 7.345880552040907e-05,
281
+ "loss": 0.7735,
282
+ "step": 39
283
+ },
284
+ {
285
+ "epoch": 1.3423799582463465,
286
+ "grad_norm": 2.508524585534657,
287
+ "learning_rate": 7.291935463574626e-05,
288
+ "loss": 0.8447,
289
+ "step": 40
290
+ },
291
+ {
292
+ "epoch": 1.3757828810020878,
293
+ "grad_norm": 1.364586902470756,
294
+ "learning_rate": 7.236067977499791e-05,
295
+ "loss": 0.7856,
296
+ "step": 41
297
+ },
298
+ {
299
+ "epoch": 1.4091858037578289,
300
+ "grad_norm": 1.8720653087352772,
301
+ "learning_rate": 7.178310718855018e-05,
302
+ "loss": 0.7829,
303
+ "step": 42
304
+ },
305
+ {
306
+ "epoch": 1.44258872651357,
307
+ "grad_norm": 1.8178558573775088,
308
+ "learning_rate": 7.11869741625289e-05,
309
+ "loss": 0.7737,
310
+ "step": 43
311
+ },
312
+ {
313
+ "epoch": 1.475991649269311,
314
+ "grad_norm": 1.6638629849138615,
315
+ "learning_rate": 7.057262882183393e-05,
316
+ "loss": 0.7737,
317
+ "step": 44
318
+ },
319
+ {
320
+ "epoch": 1.5093945720250521,
321
+ "grad_norm": 1.1958951695778888,
322
+ "learning_rate": 6.994042992684406e-05,
323
+ "loss": 0.7499,
324
+ "step": 45
325
+ },
326
+ {
327
+ "epoch": 1.5427974947807934,
328
+ "grad_norm": 1.1237749762548175,
329
+ "learning_rate": 6.929074666391095e-05,
330
+ "loss": 0.7457,
331
+ "step": 46
332
+ },
333
+ {
334
+ "epoch": 1.5762004175365343,
335
+ "grad_norm": 0.9523363043794499,
336
+ "learning_rate": 6.862395842976484e-05,
337
+ "loss": 0.7449,
338
+ "step": 47
339
+ },
340
+ {
341
+ "epoch": 1.6096033402922756,
342
+ "grad_norm": 0.7794493625394828,
343
+ "learning_rate": 6.79404546099575e-05,
344
+ "loss": 0.7471,
345
+ "step": 48
346
+ },
347
+ {
348
+ "epoch": 1.6430062630480167,
349
+ "grad_norm": 2.1007938107258224,
350
+ "learning_rate": 6.724063435147189e-05,
351
+ "loss": 0.738,
352
+ "step": 49
353
+ },
354
+ {
355
+ "epoch": 1.6764091858037578,
356
+ "grad_norm": 0.8397590052899939,
357
+ "learning_rate": 6.652490632963182e-05,
358
+ "loss": 0.7366,
359
+ "step": 50
360
+ },
361
+ {
362
+ "epoch": 1.709812108559499,
363
+ "grad_norm": 1.5244169148841136,
364
+ "learning_rate": 6.579368850944683e-05,
365
+ "loss": 0.7518,
366
+ "step": 51
367
+ },
368
+ {
369
+ "epoch": 1.7432150313152401,
370
+ "grad_norm": 0.97500410064134,
371
+ "learning_rate": 6.504740790153255e-05,
372
+ "loss": 0.7365,
373
+ "step": 52
374
+ },
375
+ {
376
+ "epoch": 1.7766179540709812,
377
+ "grad_norm": 1.833109071141926,
378
+ "learning_rate": 6.428650031274845e-05,
379
+ "loss": 0.7327,
380
+ "step": 53
381
+ },
382
+ {
383
+ "epoch": 1.8100208768267223,
384
+ "grad_norm": 1.4707510085946327,
385
+ "learning_rate": 6.351141009169893e-05,
386
+ "loss": 0.7227,
387
+ "step": 54
388
+ },
389
+ {
390
+ "epoch": 1.8434237995824634,
391
+ "grad_norm": 1.2363917202252765,
392
+ "learning_rate": 6.272258986924624e-05,
393
+ "loss": 0.7405,
394
+ "step": 55
395
+ },
396
+ {
397
+ "epoch": 1.8768267223382047,
398
+ "grad_norm": 1.0298920741813498,
399
+ "learning_rate": 6.192050029418682e-05,
400
+ "loss": 0.7241,
401
+ "step": 56
402
+ },
403
+ {
404
+ "epoch": 1.9102296450939458,
405
+ "grad_norm": 0.9097363351471279,
406
+ "learning_rate": 6.110560976424531e-05,
407
+ "loss": 0.7167,
408
+ "step": 57
409
+ },
410
+ {
411
+ "epoch": 1.9436325678496869,
412
+ "grad_norm": 0.8471695952793523,
413
+ "learning_rate": 6.027839415254362e-05,
414
+ "loss": 0.7181,
415
+ "step": 58
416
+ },
417
+ {
418
+ "epoch": 1.977035490605428,
419
+ "grad_norm": 0.6602662698524506,
420
+ "learning_rate": 5.943933652970424e-05,
421
+ "loss": 0.7088,
422
+ "step": 59
423
+ },
424
+ {
425
+ "epoch": 2.0167014613778704,
426
+ "grad_norm": 0.624041177687339,
427
+ "learning_rate": 5.858892688175075e-05,
428
+ "loss": 0.6922,
429
+ "step": 60
430
+ },
431
+ {
432
+ "epoch": 2.0501043841336117,
433
+ "grad_norm": 0.731560229530671,
434
+ "learning_rate": 5.772766182396966e-05,
435
+ "loss": 0.6655,
436
+ "step": 61
437
+ },
438
+ {
439
+ "epoch": 2.083507306889353,
440
+ "grad_norm": 0.5160825456760252,
441
+ "learning_rate": 5.685604431090117e-05,
442
+ "loss": 0.6624,
443
+ "step": 62
444
+ },
445
+ {
446
+ "epoch": 2.116910229645094,
447
+ "grad_norm": 0.6466642583190281,
448
+ "learning_rate": 5.597458334262782e-05,
449
+ "loss": 0.6474,
450
+ "step": 63
451
+ },
452
+ {
453
+ "epoch": 2.150313152400835,
454
+ "grad_norm": 0.6905839273768964,
455
+ "learning_rate": 5.508379366753282e-05,
456
+ "loss": 0.6512,
457
+ "step": 64
458
+ },
459
+ {
460
+ "epoch": 2.183716075156576,
461
+ "grad_norm": 0.3760316450742919,
462
+ "learning_rate": 5.4184195481701425e-05,
463
+ "loss": 0.6523,
464
+ "step": 65
465
+ },
466
+ {
467
+ "epoch": 2.2171189979123174,
468
+ "grad_norm": 0.606234562718693,
469
+ "learning_rate": 5.3276314125141144e-05,
470
+ "loss": 0.6487,
471
+ "step": 66
472
+ },
473
+ {
474
+ "epoch": 2.2505219206680582,
475
+ "grad_norm": 0.44809718292050676,
476
+ "learning_rate": 5.23606797749979e-05,
477
+ "loss": 0.649,
478
+ "step": 67
479
+ },
480
+ {
481
+ "epoch": 2.2839248434237995,
482
+ "grad_norm": 0.40244410097202155,
483
+ "learning_rate": 5.1437827135947566e-05,
484
+ "loss": 0.6468,
485
+ "step": 68
486
+ },
487
+ {
488
+ "epoch": 2.317327766179541,
489
+ "grad_norm": 0.359719180741915,
490
+ "learning_rate": 5.050829512794348e-05,
491
+ "loss": 0.6409,
492
+ "step": 69
493
+ },
494
+ {
495
+ "epoch": 2.3507306889352817,
496
+ "grad_norm": 0.40415638024369727,
497
+ "learning_rate": 4.9572626571502316e-05,
498
+ "loss": 0.639,
499
+ "step": 70
500
+ },
501
+ {
502
+ "epoch": 2.384133611691023,
503
+ "grad_norm": 0.3340843503248373,
504
+ "learning_rate": 4.8631367870712254e-05,
505
+ "loss": 0.6326,
506
+ "step": 71
507
+ },
508
+ {
509
+ "epoch": 2.4175365344467643,
510
+ "grad_norm": 0.3262882595570267,
511
+ "learning_rate": 4.768506869414834e-05,
512
+ "loss": 0.6298,
513
+ "step": 72
514
+ },
515
+ {
516
+ "epoch": 2.450939457202505,
517
+ "grad_norm": 0.3253891492249243,
518
+ "learning_rate": 4.6734281653881536e-05,
519
+ "loss": 0.6326,
520
+ "step": 73
521
+ },
522
+ {
523
+ "epoch": 2.4843423799582465,
524
+ "grad_norm": 0.35311540573233735,
525
+ "learning_rate": 4.577956198276886e-05,
526
+ "loss": 0.6291,
527
+ "step": 74
528
+ },
529
+ {
530
+ "epoch": 2.5177453027139873,
531
+ "grad_norm": 0.3440383701499157,
532
+ "learning_rate": 4.4821467210212924e-05,
533
+ "loss": 0.6332,
534
+ "step": 75
535
+ },
536
+ {
537
+ "epoch": 2.5511482254697286,
538
+ "grad_norm": 0.30978369513311105,
539
+ "learning_rate": 4.386055683658061e-05,
540
+ "loss": 0.6408,
541
+ "step": 76
542
+ },
543
+ {
544
+ "epoch": 2.5845511482254695,
545
+ "grad_norm": 0.3823149004105222,
546
+ "learning_rate": 4.2897392006470503e-05,
547
+ "loss": 0.6246,
548
+ "step": 77
549
+ },
550
+ {
551
+ "epoch": 2.617954070981211,
552
+ "grad_norm": 0.2810880790539587,
553
+ "learning_rate": 4.1932535181020286e-05,
554
+ "loss": 0.6293,
555
+ "step": 78
556
+ },
557
+ {
558
+ "epoch": 2.651356993736952,
559
+ "grad_norm": 0.2835535239751324,
560
+ "learning_rate": 4.096654980944529e-05,
561
+ "loss": 0.6252,
562
+ "step": 79
563
+ },
564
+ {
565
+ "epoch": 2.684759916492693,
566
+ "grad_norm": 0.336833154001104,
567
+ "learning_rate": 4e-05,
568
+ "loss": 0.6305,
569
+ "step": 80
570
+ },
571
+ {
572
+ "epoch": 2.7181628392484343,
573
+ "grad_norm": 0.23274589850456745,
574
+ "learning_rate": 3.903345019055472e-05,
575
+ "loss": 0.6298,
576
+ "step": 81
577
+ },
578
+ {
579
+ "epoch": 2.7515657620041756,
580
+ "grad_norm": 0.2420684628004819,
581
+ "learning_rate": 3.806746481897973e-05,
582
+ "loss": 0.6241,
583
+ "step": 82
584
+ },
585
+ {
586
+ "epoch": 2.7849686847599164,
587
+ "grad_norm": 0.23622928619950834,
588
+ "learning_rate": 3.710260799352951e-05,
589
+ "loss": 0.6167,
590
+ "step": 83
591
+ },
592
+ {
593
+ "epoch": 2.8183716075156577,
594
+ "grad_norm": 0.21286687906297902,
595
+ "learning_rate": 3.6139443163419394e-05,
596
+ "loss": 0.6268,
597
+ "step": 84
598
+ },
599
+ {
600
+ "epoch": 2.8517745302713986,
601
+ "grad_norm": 0.20113400910479923,
602
+ "learning_rate": 3.517853278978708e-05,
603
+ "loss": 0.622,
604
+ "step": 85
605
+ },
606
+ {
607
+ "epoch": 2.88517745302714,
608
+ "grad_norm": 0.19296938971649688,
609
+ "learning_rate": 3.422043801723116e-05,
610
+ "loss": 0.6167,
611
+ "step": 86
612
+ },
613
+ {
614
+ "epoch": 2.9185803757828808,
615
+ "grad_norm": 0.17640926051127553,
616
+ "learning_rate": 3.3265718346118464e-05,
617
+ "loss": 0.6251,
618
+ "step": 87
619
+ },
620
+ {
621
+ "epoch": 2.951983298538622,
622
+ "grad_norm": 0.17760201524918323,
623
+ "learning_rate": 3.231493130585167e-05,
624
+ "loss": 0.6195,
625
+ "step": 88
626
+ },
627
+ {
628
+ "epoch": 2.9853862212943634,
629
+ "grad_norm": 0.18267169419590248,
630
+ "learning_rate": 3.136863212928776e-05,
631
+ "loss": 0.6214,
632
+ "step": 89
633
+ },
634
+ {
635
+ "epoch": 3.1002087682672235,
636
+ "grad_norm": 0.2479134339023779,
637
+ "learning_rate": 3.0427373428497704e-05,
638
+ "loss": 0.5792,
639
+ "step": 90
640
+ },
641
+ {
642
+ "epoch": 3.1336116910229643,
643
+ "grad_norm": 0.17829804990091588,
644
+ "learning_rate": 2.9491704872056525e-05,
645
+ "loss": 0.571,
646
+ "step": 91
647
+ },
648
+ {
649
+ "epoch": 3.1670146137787056,
650
+ "grad_norm": 0.2102957726786887,
651
+ "learning_rate": 2.8562172864052437e-05,
652
+ "loss": 0.5665,
653
+ "step": 92
654
+ },
655
+ {
656
+ "epoch": 3.200417536534447,
657
+ "grad_norm": 0.18138996143773695,
658
+ "learning_rate": 2.7639320225002108e-05,
659
+ "loss": 0.5734,
660
+ "step": 93
661
+ },
662
+ {
663
+ "epoch": 3.233820459290188,
664
+ "grad_norm": 0.18231114685106467,
665
+ "learning_rate": 2.6723685874858873e-05,
666
+ "loss": 0.5665,
667
+ "step": 94
668
+ },
669
+ {
670
+ "epoch": 3.267223382045929,
671
+ "grad_norm": 0.1891068826294468,
672
+ "learning_rate": 2.5815804518298575e-05,
673
+ "loss": 0.5649,
674
+ "step": 95
675
+ },
676
+ {
677
+ "epoch": 3.30062630480167,
678
+ "grad_norm": 0.1449193634467542,
679
+ "learning_rate": 2.4916206332467184e-05,
680
+ "loss": 0.5626,
681
+ "step": 96
682
+ },
683
+ {
684
+ "epoch": 3.3340292275574113,
685
+ "grad_norm": 0.17521384625415576,
686
+ "learning_rate": 2.4025416657372186e-05,
687
+ "loss": 0.5672,
688
+ "step": 97
689
+ },
690
+ {
691
+ "epoch": 3.3674321503131526,
692
+ "grad_norm": 0.17060274829594732,
693
+ "learning_rate": 2.3143955689098844e-05,
694
+ "loss": 0.5701,
695
+ "step": 98
696
+ },
697
+ {
698
+ "epoch": 3.4008350730688934,
699
+ "grad_norm": 0.16427792254004098,
700
+ "learning_rate": 2.2272338176030354e-05,
701
+ "loss": 0.5648,
702
+ "step": 99
703
+ },
704
+ {
705
+ "epoch": 3.4342379958246347,
706
+ "grad_norm": 0.16851785214921267,
707
+ "learning_rate": 2.141107311824926e-05,
708
+ "loss": 0.5637,
709
+ "step": 100
710
+ },
711
+ {
712
+ "epoch": 3.4676409185803756,
713
+ "grad_norm": 0.1647295715319099,
714
+ "learning_rate": 2.056066347029576e-05,
715
+ "loss": 0.5698,
716
+ "step": 101
717
+ },
718
+ {
719
+ "epoch": 3.501043841336117,
720
+ "grad_norm": 0.14383360405355872,
721
+ "learning_rate": 1.9721605847456397e-05,
722
+ "loss": 0.5678,
723
+ "step": 102
724
+ },
725
+ {
726
+ "epoch": 3.534446764091858,
727
+ "grad_norm": 0.16369393007489977,
728
+ "learning_rate": 1.8894390235754686e-05,
729
+ "loss": 0.5687,
730
+ "step": 103
731
+ },
732
+ {
733
+ "epoch": 3.567849686847599,
734
+ "grad_norm": 0.1484533364671656,
735
+ "learning_rate": 1.807949970581321e-05,
736
+ "loss": 0.5612,
737
+ "step": 104
738
+ },
739
+ {
740
+ "epoch": 3.6012526096033404,
741
+ "grad_norm": 0.13327395767499348,
742
+ "learning_rate": 1.7277410130753775e-05,
743
+ "loss": 0.5621,
744
+ "step": 105
745
+ },
746
+ {
747
+ "epoch": 3.6346555323590817,
748
+ "grad_norm": 0.14483989970799924,
749
+ "learning_rate": 1.648858990830108e-05,
750
+ "loss": 0.5602,
751
+ "step": 106
752
+ },
753
+ {
754
+ "epoch": 3.6680584551148225,
755
+ "grad_norm": 0.11501467177953302,
756
+ "learning_rate": 1.5713499687251554e-05,
757
+ "loss": 0.5625,
758
+ "step": 107
759
+ },
760
+ {
761
+ "epoch": 3.701461377870564,
762
+ "grad_norm": 0.12471522633663724,
763
+ "learning_rate": 1.4952592098467453e-05,
764
+ "loss": 0.5566,
765
+ "step": 108
766
+ },
767
+ {
768
+ "epoch": 3.7348643006263047,
769
+ "grad_norm": 0.12841415317626956,
770
+ "learning_rate": 1.4206311490553187e-05,
771
+ "loss": 0.5563,
772
+ "step": 109
773
+ },
774
+ {
775
+ "epoch": 3.768267223382046,
776
+ "grad_norm": 0.13024977809323665,
777
+ "learning_rate": 1.3475093670368202e-05,
778
+ "loss": 0.5642,
779
+ "step": 110
780
+ },
781
+ {
782
+ "epoch": 3.801670146137787,
783
+ "grad_norm": 0.12141142140280577,
784
+ "learning_rate": 1.275936564852811e-05,
785
+ "loss": 0.5619,
786
+ "step": 111
787
+ },
788
+ {
789
+ "epoch": 3.835073068893528,
790
+ "grad_norm": 0.1189681977822036,
791
+ "learning_rate": 1.2059545390042526e-05,
792
+ "loss": 0.5627,
793
+ "step": 112
794
+ },
795
+ {
796
+ "epoch": 3.8684759916492695,
797
+ "grad_norm": 0.11637565722872692,
798
+ "learning_rate": 1.1376041570235162e-05,
799
+ "loss": 0.5597,
800
+ "step": 113
801
+ },
802
+ {
803
+ "epoch": 3.9018789144050103,
804
+ "grad_norm": 0.11126444342562675,
805
+ "learning_rate": 1.070925333608907e-05,
806
+ "loss": 0.5646,
807
+ "step": 114
808
+ },
809
+ {
810
+ "epoch": 3.9352818371607516,
811
+ "grad_norm": 0.11144727795080511,
812
+ "learning_rate": 1.0059570073155953e-05,
813
+ "loss": 0.5663,
814
+ "step": 115
815
+ },
816
+ {
817
+ "epoch": 3.968684759916493,
818
+ "grad_norm": 0.11568625785765184,
819
+ "learning_rate": 9.427371178166065e-06,
820
+ "loss": 0.5628,
821
+ "step": 116
822
+ },
823
+ {
824
+ "epoch": 4.002087682672234,
825
+ "grad_norm": 0.1172051146855964,
826
+ "learning_rate": 8.81302583747111e-06,
827
+ "loss": 0.5657,
828
+ "step": 117
829
+ },
830
+ {
831
+ "epoch": 4.035490605427975,
832
+ "grad_norm": 0.1386962246589997,
833
+ "learning_rate": 8.216892811449834e-06,
834
+ "loss": 0.5431,
835
+ "step": 118
836
+ },
837
+ {
838
+ "epoch": 4.068893528183716,
839
+ "grad_norm": 0.12227065820674828,
840
+ "learning_rate": 7.639320225002106e-06,
841
+ "loss": 0.5386,
842
+ "step": 119
843
+ },
844
+ {
845
+ "epoch": 4.102296450939457,
846
+ "grad_norm": 0.11676189189940173,
847
+ "learning_rate": 7.080645364253747e-06,
848
+ "loss": 0.5341,
849
+ "step": 120
850
+ },
851
+ {
852
+ "epoch": 4.135699373695198,
853
+ "grad_norm": 0.1086671391408473,
854
+ "learning_rate": 6.541194479590931e-06,
855
+ "loss": 0.5472,
856
+ "step": 121
857
+ },
858
+ {
859
+ "epoch": 4.16910229645094,
860
+ "grad_norm": 0.11780635228612878,
861
+ "learning_rate": 6.021282595139167e-06,
862
+ "loss": 0.5376,
863
+ "step": 122
864
+ },
865
+ {
866
+ "epoch": 4.202505219206681,
867
+ "grad_norm": 0.11122819389457546,
868
+ "learning_rate": 5.521213324798029e-06,
869
+ "loss": 0.5405,
870
+ "step": 123
871
+ },
872
+ {
873
+ "epoch": 4.235908141962422,
874
+ "grad_norm": 0.11302950909483094,
875
+ "learning_rate": 5.0412786949392845e-06,
876
+ "loss": 0.5389,
877
+ "step": 124
878
+ },
879
+ {
880
+ "epoch": 4.2693110647181625,
881
+ "grad_norm": 0.10897006989347469,
882
+ "learning_rate": 4.581758973871609e-06,
883
+ "loss": 0.5443,
884
+ "step": 125
885
+ },
886
+ {
887
+ "epoch": 4.302713987473904,
888
+ "grad_norm": 0.10317913683812792,
889
+ "learning_rate": 4.142922508171849e-06,
890
+ "loss": 0.5363,
891
+ "step": 126
892
+ },
893
+ {
894
+ "epoch": 4.336116910229645,
895
+ "grad_norm": 0.10003486708202455,
896
+ "learning_rate": 3.7250255659781844e-06,
897
+ "loss": 0.5364,
898
+ "step": 127
899
+ },
900
+ {
901
+ "epoch": 4.369519832985386,
902
+ "grad_norm": 0.1071731871255614,
903
+ "learning_rate": 3.3283121873367043e-06,
904
+ "loss": 0.5432,
905
+ "step": 128
906
+ },
907
+ {
908
+ "epoch": 4.402922755741128,
909
+ "grad_norm": 0.10539727253992291,
910
+ "learning_rate": 2.9530140416889465e-06,
911
+ "loss": 0.5373,
912
+ "step": 129
913
+ },
914
+ {
915
+ "epoch": 4.4363256784968685,
916
+ "grad_norm": 0.09606764766200912,
917
+ "learning_rate": 2.5993502925834115e-06,
918
+ "loss": 0.5333,
919
+ "step": 130
920
+ },
921
+ {
922
+ "epoch": 4.469728601252609,
923
+ "grad_norm": 0.09219554498498256,
924
+ "learning_rate": 2.2675274696902737e-06,
925
+ "loss": 0.5315,
926
+ "step": 131
927
+ },
928
+ {
929
+ "epoch": 4.503131524008351,
930
+ "grad_norm": 0.08960509523269163,
931
+ "learning_rate": 1.957739348193859e-06,
932
+ "loss": 0.5334,
933
+ "step": 132
934
+ },
935
+ {
936
+ "epoch": 4.536534446764092,
937
+ "grad_norm": 0.09160224849384657,
938
+ "learning_rate": 1.670166835633351e-06,
939
+ "loss": 0.5384,
940
+ "step": 133
941
+ },
942
+ {
943
+ "epoch": 4.569937369519833,
944
+ "grad_norm": 0.08852713488345453,
945
+ "learning_rate": 1.4049778662579462e-06,
946
+ "loss": 0.53,
947
+ "step": 134
948
+ },
949
+ {
950
+ "epoch": 4.603340292275574,
951
+ "grad_norm": 0.09132315616256415,
952
+ "learning_rate": 1.1623273029579195e-06,
953
+ "loss": 0.538,
954
+ "step": 135
955
+ },
956
+ {
957
+ "epoch": 4.6367432150313155,
958
+ "grad_norm": 0.09194633813127549,
959
+ "learning_rate": 9.423568468291156e-07,
960
+ "loss": 0.541,
961
+ "step": 136
962
+ },
963
+ {
964
+ "epoch": 4.670146137787056,
965
+ "grad_norm": 0.09237238332398756,
966
+ "learning_rate": 7.451949544234627e-07,
967
+ "loss": 0.5379,
968
+ "step": 137
969
+ },
970
+ {
971
+ "epoch": 4.703549060542797,
972
+ "grad_norm": 0.08974432368849375,
973
+ "learning_rate": 5.709567627339674e-07,
974
+ "loss": 0.5443,
975
+ "step": 138
976
+ },
977
+ {
978
+ "epoch": 4.736951983298539,
979
+ "grad_norm": 0.09013165116820136,
980
+ "learning_rate": 4.1974402195795514e-07,
981
+ "loss": 0.535,
982
+ "step": 139
983
+ },
984
+ {
985
+ "epoch": 4.77035490605428,
986
+ "grad_norm": 0.09022158123863006,
987
+ "learning_rate": 2.916450360778411e-07,
988
+ "loss": 0.5333,
989
+ "step": 140
990
+ },
991
+ {
992
+ "epoch": 4.803757828810021,
993
+ "grad_norm": 0.08981410735542258,
994
+ "learning_rate": 1.867346112940549e-07,
995
+ "loss": 0.5462,
996
+ "step": 141
997
+ },
998
+ {
999
+ "epoch": 4.8371607515657615,
1000
+ "grad_norm": 0.09141559902413697,
1001
+ "learning_rate": 1.0507401234035819e-07,
1002
+ "loss": 0.5377,
1003
+ "step": 142
1004
+ },
1005
+ {
1006
+ "epoch": 4.870563674321503,
1007
+ "grad_norm": 0.08951832053531815,
1008
+ "learning_rate": 4.6710926706934336e-08,
1009
+ "loss": 0.5305,
1010
+ "step": 143
1011
+ },
1012
+ {
1013
+ "epoch": 4.903966597077244,
1014
+ "grad_norm": 0.08926873218925392,
1015
+ "learning_rate": 1.1679436792282339e-08,
1016
+ "loss": 0.54,
1017
+ "step": 144
1018
+ },
1019
+ {
1020
+ "epoch": 4.937369519832985,
1021
+ "grad_norm": 0.08723888890212035,
1022
+ "learning_rate": 0.0,
1023
+ "loss": 0.54,
1024
+ "step": 145
1025
+ },
1026
+ {
1027
+ "epoch": 4.937369519832985,
1028
+ "step": 145,
1029
+ "total_flos": 3.738141667979428e+18,
1030
+ "train_loss": 0.2130410626016814,
1031
+ "train_runtime": 6079.0514,
1032
+ "train_samples_per_second": 12.591,
1033
+ "train_steps_per_second": 0.024
1034
+ }
1035
+ ],
1036
+ "logging_steps": 1,
1037
+ "max_steps": 145,
1038
+ "num_input_tokens_seen": 0,
1039
+ "num_train_epochs": 5,
1040
+ "save_steps": 500,
1041
+ "stateful_callbacks": {
1042
+ "TrainerControl": {
1043
+ "args": {
1044
+ "should_epoch_stop": false,
1045
+ "should_evaluate": false,
1046
+ "should_log": false,
1047
+ "should_save": true,
1048
+ "should_training_stop": true
1049
+ },
1050
+ "attributes": {}
1051
+ }
1052
+ },
1053
+ "total_flos": 3.738141667979428e+18,
1054
+ "train_batch_size": 1,
1055
+ "trial_name": null,
1056
+ "trial_params": null
1057
+ }
training_loss.png ADDED