lemexp-task1-lemma_command_small-deepseek-coder-1.3b-base-ddp-8lr

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4760

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0.2003 461 0.6892
0.7574 0.4005 922 0.6424
0.6677 0.6008 1383 0.6216
0.6303 0.8010 1844 0.5984
0.6041 1.0013 2305 0.5879
0.5735 1.2016 2766 0.5722
0.5391 1.4018 3227 0.5680
0.5391 1.6021 3688 0.5550
0.535 1.8023 4149 0.5425
0.5287 2.0026 4610 0.5379
0.4853 2.2029 5071 0.5346
0.4814 2.4031 5532 0.5217
0.4814 2.6034 5993 0.5209
0.4788 2.8036 6454 0.5221
0.4838 3.0039 6915 0.5175
0.4701 3.2042 7376 0.5148
0.4385 3.4044 7837 0.5113
0.4364 3.6047 8298 0.5008
0.4414 3.8050 8759 0.4942
0.4437 4.0052 9220 0.4970
0.417 4.2055 9681 0.4972
0.4037 4.4057 10142 0.4994
0.4084 4.6060 10603 0.4868
0.4124 4.8063 11064 0.4834
0.411 5.0065 11525 0.4899
0.411 5.2068 11986 0.4813
0.3653 5.4070 12447 0.4861
0.3768 5.6073 12908 0.4777
0.3801 5.8076 13369 0.4807
0.3758 6.0078 13830 0.4789
0.3642 6.2081 14291 0.4871
0.3438 6.4083 14752 0.4755
0.3464 6.6086 15213 0.4723
0.3487 6.8089 15674 0.4667
0.3492 7.0091 16135 0.4702
0.3173 7.2094 16596 0.4775
0.3177 7.4096 17057 0.4680
0.3195 7.6099 17518 0.4652
0.3195 7.8102 17979 0.4677
0.3196 8.0104 18440 0.4718
0.3155 8.2107 18901 0.4697
0.2838 8.4109 19362 0.4645
0.2915 8.6112 19823 0.4662
0.2895 8.8115 20284 0.4538
0.291 9.0117 20745 0.4706
0.2699 9.2120 21206 0.4666
0.2573 9.4123 21667 0.4654
0.264 9.6125 22128 0.4646
0.2642 9.8128 22589 0.4619
0.2647 10.0130 23050 0.4739
0.2333 10.2133 23511 0.4733
0.2333 10.4136 23972 0.4706
0.2339 10.6138 24433 0.4674
0.2347 10.8141 24894 0.4656
0.2391 11.0143 25355 0.4787
0.2242 11.2146 25816 0.4788
0.2089 11.4149 26277 0.4824
0.211 11.6151 26738 0.4767
0.2152 11.8154 27199 0.4760

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-lemma_command_small-deepseek-coder-1.3b-base-ddp-8lr

Adapter
(157)
this model