lemexp-task1-min_symbols_template_full-deepseek-coder-1.3b-base-ddp

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1978

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 6
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.3666 0.2000 2907 0.3499
0.3395 0.4001 5814 0.3318
0.3267 0.6001 8721 0.3205
0.3174 0.8001 11628 0.3100
0.305 1.0001 14535 0.3098
0.3004 1.2002 17442 0.2988
0.3017 1.4002 20349 0.2947
0.2912 1.6002 23256 0.2879
0.2843 1.8002 26163 0.2795
0.2793 2.0003 29070 0.2770
0.27 2.2003 31977 0.2737
0.2697 2.4003 34884 0.2725
0.2641 2.6004 37791 0.2652
0.2586 2.8004 40698 0.2608
0.2564 3.0004 43605 0.2545
0.2464 3.2004 46512 0.2516
0.2428 3.4005 49419 0.2442
0.2395 3.6005 52326 0.2403
0.234 3.8005 55233 0.2367
0.2299 4.0006 58140 0.2311
0.2178 4.2006 61047 0.2271
0.2163 4.4006 63954 0.2227
0.2114 4.6006 66861 0.2175
0.2083 4.8007 69768 0.2150
0.2026 5.0007 72675 0.2100
0.1907 5.2007 75582 0.2065
0.1893 5.4007 78489 0.2020
0.1874 5.6008 81396 0.1982
0.1812 5.8008 84303 0.1978

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-min_symbols_template_full-deepseek-coder-1.3b-base-ddp

Adapter
(170)
this model