Applied AI commited on
Commit
eb6ff4a
·
verified ·
1 Parent(s): af7db09

Model save

Browse files
README.md CHANGED
@@ -2,15 +2,11 @@
2
  license: mit
3
  base_model: gpt2
4
  tags:
5
- - alignment-handbook
6
- - trl
7
- - sft
8
- - generated_from_trainer
9
  - trl
10
  - sft
11
  - generated_from_trainer
12
  datasets:
13
- - appliedai-qx/sample-dataset-ah
14
  model-index:
15
  - name: gpt2
16
  results: []
@@ -21,7 +17,7 @@ should probably proofread and complete it, then remove this comment. -->
21
 
22
  # gpt2
23
 
24
- This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the appliedai-qx/sample-dataset-ah dataset.
25
 
26
  ## Model description
27
 
@@ -45,9 +41,9 @@ The following hyperparameters were used during training:
45
  - eval_batch_size: 8
46
  - seed: 42
47
  - distributed_type: multi-GPU
48
- - num_devices: 4
49
- - total_train_batch_size: 64
50
- - total_eval_batch_size: 32
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: cosine
53
  - lr_scheduler_warmup_ratio: 0.1
 
2
  license: mit
3
  base_model: gpt2
4
  tags:
 
 
 
 
5
  - trl
6
  - sft
7
  - generated_from_trainer
8
  datasets:
9
+ - generator
10
  model-index:
11
  - name: gpt2
12
  results: []
 
17
 
18
  # gpt2
19
 
20
+ This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the generator dataset.
21
 
22
  ## Model description
23
 
 
41
  - eval_batch_size: 8
42
  - seed: 42
43
  - distributed_type: multi-GPU
44
+ - num_devices: 32
45
+ - total_train_batch_size: 512
46
+ - total_eval_batch_size: 256
47
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
  - lr_scheduler_type: cosine
49
  - lr_scheduler_warmup_ratio: 0.1
all_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 1.0,
3
- "total_flos": 7357983621120000.0,
4
- "train_loss": 1.3830752210183577,
5
- "train_runtime": 55.0309,
6
- "train_samples": 10000,
7
- "train_samples_per_second": 255.239,
8
- "train_steps_per_second": 3.998
9
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "total_flos": 1.011923420184576e+18,
4
+ "train_loss": 0.9601045426144794,
5
+ "train_runtime": 898.0661,
6
+ "train_samples": 1371223,
7
+ "train_samples_per_second": 2156.094,
8
+ "train_steps_per_second": 4.211
9
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fb8e6337b30209ac71373e81fd96da57437e0dd62d8e0bf87cccc75d7c16df40
3
  size 248894656
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2aa31b4c25100273b321a8bd3f669cc58c946babae8edc15a5e09670ddcdf324
3
  size 248894656
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 1.0,
3
- "total_flos": 7357983621120000.0,
4
- "train_loss": 1.3830752210183577,
5
- "train_runtime": 55.0309,
6
- "train_samples": 10000,
7
- "train_samples_per_second": 255.239,
8
- "train_steps_per_second": 3.998
9
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "total_flos": 1.011923420184576e+18,
4
+ "train_loss": 0.9601045426144794,
5
+ "train_runtime": 898.0661,
6
+ "train_samples": 1371223,
7
+ "train_samples_per_second": 2156.094,
8
+ "train_steps_per_second": 4.211
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff