lemexp-task1-lemma_object_full-deepseek-coder-1.3b-base-ddp-8lr

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2604

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 18
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5095 0.2000 2902 0.5062
0.4742 0.4001 5804 0.4674
0.4587 0.6001 8706 0.4537
0.4423 0.8001 11608 0.4393
0.4428 1.0001 14510 0.4372
0.4258 1.2002 17412 0.4330
0.422 1.4002 20314 0.4243
0.4149 1.6002 23216 0.4243
0.4114 1.8002 26118 0.4074
0.4061 2.0003 29020 0.4055
0.3983 2.2003 31922 0.4032
0.393 2.4003 34824 0.3958
0.3975 2.6004 37726 0.3877
0.3895 2.8004 40628 0.3948
0.3816 3.0004 43530 0.3825
0.373 3.2004 46432 0.3825
0.374 3.4005 49334 0.3808
0.3746 3.6005 52236 0.3770
0.3738 3.8005 55138 0.3729
0.3691 4.0006 58040 0.3665
0.3585 4.2006 60942 0.3661
0.3603 4.4006 63844 0.3672
0.358 4.6006 66746 0.3587
0.3491 4.8007 69648 0.3527
0.3513 5.0007 72550 0.3507
0.3434 5.2007 75452 0.3515
0.3398 5.4007 78354 0.3479
0.3406 5.6008 81256 0.3465
0.3355 5.8008 84158 0.3406
0.3312 6.0008 87060 0.3377
0.3226 6.2009 89962 0.3376
0.3163 6.4009 92864 0.3321
0.3211 6.6009 95766 0.3275
0.3165 6.8009 98668 0.3246
0.3178 7.0010 101570 0.3172
0.3028 7.2010 104472 0.3183
0.3043 7.4010 107374 0.3165
0.3032 7.6010 110276 0.3131
0.3042 7.8011 113178 0.3088
0.2973 8.0011 116080 0.3101
0.2855 8.2011 118982 0.3037
0.2825 8.4012 121884 0.3014
0.2865 8.6012 124786 0.3024
0.2814 8.8012 127688 0.2963
0.2807 9.0012 130590 0.2922
0.2686 9.2013 133492 0.2937
0.2679 9.4013 136394 0.2872
0.2678 9.6013 139296 0.2870
0.2609 9.8014 142198 0.2839
0.2623 10.0014 145100 0.2808
0.2497 10.2014 148002 0.2788
0.2451 10.4014 150904 0.2753
0.2473 10.6015 153806 0.2743
0.2456 10.8015 156708 0.2694
0.2449 11.0015 159610 0.2698
0.2296 11.2015 162512 0.2697
0.2321 11.4016 165414 0.2684
0.2291 11.6016 168316 0.2672
0.2296 11.8016 171218 0.2651
0.2296 12.0017 174120 0.2723
0.2663 12.2017 177022 0.2968
0.2718 12.4017 179924 0.2948
0.2773 12.6017 182826 0.2954
0.2749 12.8018 185728 0.2944
0.2754 13.0018 188630 0.2876
0.2629 13.2018 191532 0.2903
0.2606 13.4018 194434 0.2866
0.2665 13.6019 197336 0.2836
0.2621 13.8019 200238 0.2871
0.2629 14.0019 203140 0.2840
0.2498 14.2020 206042 0.2823
0.2533 14.4020 208944 0.2788
0.2505 14.6020 211846 0.2781
0.2515 14.8020 214748 0.2726
0.2505 15.0021 217650 0.2767
0.2387 15.2021 220552 0.2736
0.2348 15.4021 223454 0.2728
0.236 15.6022 226356 0.2686
0.2423 15.8022 229258 0.2664
0.2365 16.0022 232160 0.2671
0.2242 16.2022 235062 0.2659
0.2241 16.4023 237964 0.2680
0.2277 16.6023 240866 0.2634
0.2253 16.8023 243768 0.2624
0.2237 17.0023 246670 0.2628
0.215 17.2024 249572 0.2638
0.2141 17.4024 252474 0.2609
0.2145 17.6024 255376 0.2604
0.2101 17.8025 258278 0.2604

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-lemma_object_full-deepseek-coder-1.3b-base-ddp-8lr

Adapter
(171)
this model