ryanmarten commited on
Commit
9c18580
·
verified ·
1 Parent(s): 53d8a41

Model save

Browse files
Files changed (1) hide show
  1. README.md +8 -9
README.md CHANGED
@@ -4,19 +4,18 @@ license: apache-2.0
4
  base_model: Qwen/Qwen2.5-7B-Instruct
5
  tags:
6
  - llama-factory
7
- - full
8
  - generated_from_trainer
9
  model-index:
10
- - name: fig1_all_s1
11
  results: []
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # fig1_all_s1
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the mlfoundations-dev/fig1_all_s1 dataset.
20
 
21
  ## Model description
22
 
@@ -40,11 +39,11 @@ The following hyperparameters were used during training:
40
  - eval_batch_size: 8
41
  - seed: 42
42
  - distributed_type: multi-GPU
43
- - num_devices: 8
44
- - gradient_accumulation_steps: 12
45
  - total_train_batch_size: 96
46
- - total_eval_batch_size: 64
47
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
  - lr_scheduler_type: cosine
49
  - lr_scheduler_warmup_ratio: 0.1
50
  - num_epochs: 7.0
@@ -56,6 +55,6 @@ The following hyperparameters were used during training:
56
  ### Framework versions
57
 
58
  - Transformers 4.46.1
59
- - Pytorch 2.3.0
60
  - Datasets 3.1.0
61
  - Tokenizers 0.20.3
 
4
  base_model: Qwen/Qwen2.5-7B-Instruct
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
+ - name: s1
10
  results: []
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
+ # s1
17
 
18
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on an unknown dataset.
19
 
20
  ## Model description
21
 
 
39
  - eval_batch_size: 8
40
  - seed: 42
41
  - distributed_type: multi-GPU
42
+ - num_devices: 4
43
+ - gradient_accumulation_steps: 24
44
  - total_train_batch_size: 96
45
+ - total_eval_batch_size: 32
46
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
47
  - lr_scheduler_type: cosine
48
  - lr_scheduler_warmup_ratio: 0.1
49
  - num_epochs: 7.0
 
55
  ### Framework versions
56
 
57
  - Transformers 4.46.1
58
+ - Pytorch 2.6.0+cu124
59
  - Datasets 3.1.0
60
  - Tokenizers 0.20.3