lemexp-task1-lemma_object_full-deepseek-coder-1.3b-base-ddp-8lr
This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.2604
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0008
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 18
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.5095 | 0.2000 | 2902 | 0.5062 |
0.4742 | 0.4001 | 5804 | 0.4674 |
0.4587 | 0.6001 | 8706 | 0.4537 |
0.4423 | 0.8001 | 11608 | 0.4393 |
0.4428 | 1.0001 | 14510 | 0.4372 |
0.4258 | 1.2002 | 17412 | 0.4330 |
0.422 | 1.4002 | 20314 | 0.4243 |
0.4149 | 1.6002 | 23216 | 0.4243 |
0.4114 | 1.8002 | 26118 | 0.4074 |
0.4061 | 2.0003 | 29020 | 0.4055 |
0.3983 | 2.2003 | 31922 | 0.4032 |
0.393 | 2.4003 | 34824 | 0.3958 |
0.3975 | 2.6004 | 37726 | 0.3877 |
0.3895 | 2.8004 | 40628 | 0.3948 |
0.3816 | 3.0004 | 43530 | 0.3825 |
0.373 | 3.2004 | 46432 | 0.3825 |
0.374 | 3.4005 | 49334 | 0.3808 |
0.3746 | 3.6005 | 52236 | 0.3770 |
0.3738 | 3.8005 | 55138 | 0.3729 |
0.3691 | 4.0006 | 58040 | 0.3665 |
0.3585 | 4.2006 | 60942 | 0.3661 |
0.3603 | 4.4006 | 63844 | 0.3672 |
0.358 | 4.6006 | 66746 | 0.3587 |
0.3491 | 4.8007 | 69648 | 0.3527 |
0.3513 | 5.0007 | 72550 | 0.3507 |
0.3434 | 5.2007 | 75452 | 0.3515 |
0.3398 | 5.4007 | 78354 | 0.3479 |
0.3406 | 5.6008 | 81256 | 0.3465 |
0.3355 | 5.8008 | 84158 | 0.3406 |
0.3312 | 6.0008 | 87060 | 0.3377 |
0.3226 | 6.2009 | 89962 | 0.3376 |
0.3163 | 6.4009 | 92864 | 0.3321 |
0.3211 | 6.6009 | 95766 | 0.3275 |
0.3165 | 6.8009 | 98668 | 0.3246 |
0.3178 | 7.0010 | 101570 | 0.3172 |
0.3028 | 7.2010 | 104472 | 0.3183 |
0.3043 | 7.4010 | 107374 | 0.3165 |
0.3032 | 7.6010 | 110276 | 0.3131 |
0.3042 | 7.8011 | 113178 | 0.3088 |
0.2973 | 8.0011 | 116080 | 0.3101 |
0.2855 | 8.2011 | 118982 | 0.3037 |
0.2825 | 8.4012 | 121884 | 0.3014 |
0.2865 | 8.6012 | 124786 | 0.3024 |
0.2814 | 8.8012 | 127688 | 0.2963 |
0.2807 | 9.0012 | 130590 | 0.2922 |
0.2686 | 9.2013 | 133492 | 0.2937 |
0.2679 | 9.4013 | 136394 | 0.2872 |
0.2678 | 9.6013 | 139296 | 0.2870 |
0.2609 | 9.8014 | 142198 | 0.2839 |
0.2623 | 10.0014 | 145100 | 0.2808 |
0.2497 | 10.2014 | 148002 | 0.2788 |
0.2451 | 10.4014 | 150904 | 0.2753 |
0.2473 | 10.6015 | 153806 | 0.2743 |
0.2456 | 10.8015 | 156708 | 0.2694 |
0.2449 | 11.0015 | 159610 | 0.2698 |
0.2296 | 11.2015 | 162512 | 0.2697 |
0.2321 | 11.4016 | 165414 | 0.2684 |
0.2291 | 11.6016 | 168316 | 0.2672 |
0.2296 | 11.8016 | 171218 | 0.2651 |
0.2296 | 12.0017 | 174120 | 0.2723 |
0.2663 | 12.2017 | 177022 | 0.2968 |
0.2718 | 12.4017 | 179924 | 0.2948 |
0.2773 | 12.6017 | 182826 | 0.2954 |
0.2749 | 12.8018 | 185728 | 0.2944 |
0.2754 | 13.0018 | 188630 | 0.2876 |
0.2629 | 13.2018 | 191532 | 0.2903 |
0.2606 | 13.4018 | 194434 | 0.2866 |
0.2665 | 13.6019 | 197336 | 0.2836 |
0.2621 | 13.8019 | 200238 | 0.2871 |
0.2629 | 14.0019 | 203140 | 0.2840 |
0.2498 | 14.2020 | 206042 | 0.2823 |
0.2533 | 14.4020 | 208944 | 0.2788 |
0.2505 | 14.6020 | 211846 | 0.2781 |
0.2515 | 14.8020 | 214748 | 0.2726 |
0.2505 | 15.0021 | 217650 | 0.2767 |
0.2387 | 15.2021 | 220552 | 0.2736 |
0.2348 | 15.4021 | 223454 | 0.2728 |
0.236 | 15.6022 | 226356 | 0.2686 |
0.2423 | 15.8022 | 229258 | 0.2664 |
0.2365 | 16.0022 | 232160 | 0.2671 |
0.2242 | 16.2022 | 235062 | 0.2659 |
0.2241 | 16.4023 | 237964 | 0.2680 |
0.2277 | 16.6023 | 240866 | 0.2634 |
0.2253 | 16.8023 | 243768 | 0.2624 |
0.2237 | 17.0023 | 246670 | 0.2628 |
0.215 | 17.2024 | 249572 | 0.2638 |
0.2141 | 17.4024 | 252474 | 0.2609 |
0.2145 | 17.6024 | 255376 | 0.2604 |
0.2101 | 17.8025 | 258278 | 0.2604 |
Framework versions
- PEFT 0.14.0
- Transformers 4.47.0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for yalhessi/lemexp-task1-lemma_object_full-deepseek-coder-1.3b-base-ddp-8lr
Base model
deepseek-ai/deepseek-coder-1.3b-base