mdroth
/

codeparrot-ds-model

@@ -16,7 +16,12 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.9517
 ## Model description
@@ -36,42 +41,17 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0005
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 256
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 5000
 - num_epochs: 3
 - mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step   | Validation Loss |
-|:-------------:|:------:|:------:|:---------------:|
-| 2.4181        | 0.1533 | 10000  | 1.5849          |
-| 1.5361        | 0.3065 | 20000  | 1.4070          |
-| 1.4242        | 0.4598 | 30000  | 1.3356          |
-| 1.3721        | 0.6131 | 40000  | 1.2948          |
-| 1.3327        | 0.7664 | 50000  | 1.2638          |
-| 1.3053        | 0.9196 | 60000  | 1.2403          |
-| 1.2751        | 1.0729 | 70000  | 1.2146          |
-| 1.2491        | 1.2262 | 80000  | 1.1876          |
-| 1.2258        | 1.3795 | 90000  | 1.1642          |
-| 1.1997        | 1.5327 | 100000 | 1.1374          |
-| 1.1724        | 1.6860 | 110000 | 1.1113          |
-| 1.1444        | 1.8393 | 120000 | 1.0826          |
-| 1.1157        | 1.9926 | 130000 | 1.0551          |
-| 1.0762        | 2.1458 | 140000 | 1.0298          |
-| 1.0513        | 2.2991 | 150000 | 1.0042          |
-| 1.0275        | 2.4524 | 160000 | 0.9827          |
-| 1.0067        | 2.6056 | 170000 | 0.9662          |
-| 0.9906        | 2.7589 | 180000 | 0.9556          |
-| 0.9827        | 2.9122 | 190000 | 0.9517          |
 ### Framework versions
 - Transformers 4.48.3

 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- eval_loss: 1.7598
+- eval_runtime: 364.6427
+- eval_samples_per_second: 255.494
+- eval_steps_per_second: 15.969
+- epoch: 0.0766
+- step: 20000
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0005
+- train_batch_size: 16
+- eval_batch_size: 16
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 64
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 1000
 - num_epochs: 3
 - mixed_precision_training: Native AMP
 ### Framework versions
 - Transformers 4.48.3

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6e1467f88537d221fe26ae721db3ac69cc0ed9e261713823d0e0b69e9eae880d
 size 496984704

 version https://git-lfs.github.com/spec/v1
+oid sha256:42a2a49483da280d2380cbb11f613a5eb8bb89ddbf9a5eb670a0575e9bf4092f
 size 496984704

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dcef805671c5cd07413ed54f02867f0c0d960d31fae1dfe656538f6f3bb285ea
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:9e518a33692ffeead2f83c3d97e15b6a9dca9612df1fcb9a40b051d51655c878
 size 5304