lemexp-task1-v2-template_small-deepseek-coder-1.3b-base-ddp-8lr-v2

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1630

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.3915 0.2001 720 0.3026
0.2875 0.4001 1440 0.2625
0.254 0.6002 2160 0.2493
0.2464 0.8002 2880 0.2448
0.2334 1.0003 3600 0.2371
0.2206 1.2003 4320 0.2485
0.2162 1.4004 5040 0.2268
0.2149 1.6004 5760 0.2221
0.2132 1.8005 6480 0.2160
0.2071 2.0006 7200 0.2112
0.1996 2.2006 7920 0.2102
0.1972 2.4007 8640 0.2123
0.1936 2.6007 9360 0.2065
0.1931 2.8008 10080 0.1994
0.1937 3.0008 10800 0.2040
0.1812 3.2009 11520 0.1981
0.1849 3.4009 12240 0.2016
0.1791 3.6010 12960 0.2017
0.1785 3.8011 13680 0.1919
0.1784 4.0011 14400 0.1907
0.1683 4.2012 15120 0.1929
0.1697 4.4012 15840 0.1862
0.1658 4.6013 16560 0.1837
0.1673 4.8013 17280 0.1918
0.1641 5.0014 18000 0.1847
0.1549 5.2014 18720 0.1813
0.1542 5.4015 19440 0.1885
0.1561 5.6016 20160 0.1792
0.1569 5.8016 20880 0.1751
0.1513 6.0017 21600 0.1710
0.143 6.2017 22320 0.1737
0.1443 6.4018 23040 0.1725
0.1422 6.6018 23760 0.1689
0.1444 6.8019 24480 0.1668
0.1406 7.0019 25200 0.1649
0.1325 7.2020 25920 0.1691
0.1312 7.4021 26640 0.1656
0.1307 7.6021 27360 0.1629
0.1288 7.8022 28080 0.1638
0.131 8.0022 28800 0.1632
0.1201 8.2023 29520 0.1647
0.1179 8.4023 30240 0.1626
0.1188 8.6024 30960 0.1619
0.1176 8.8024 31680 0.1569
0.1182 9.0025 32400 0.1578
0.1067 9.2026 33120 0.1634
0.1071 9.4026 33840 0.1586
0.1075 9.6027 34560 0.1557
0.1067 9.8027 35280 0.1536
0.1038 10.0028 36000 0.1572
0.0954 10.2028 36720 0.1634
0.0949 10.4029 37440 0.1577
0.0944 10.6029 38160 0.1591
0.0944 10.8030 38880 0.1575
0.0943 11.0031 39600 0.1551
0.0855 11.2031 40320 0.1632
0.0848 11.4032 41040 0.1618
0.0841 11.6032 41760 0.1619
0.0838 11.8033 42480 0.1630

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
595
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-v2-template_small-deepseek-coder-1.3b-base-ddp-8lr-v2

Adapter
(155)
this model