ptrdvn commited on
Commit
d605834
·
verified ·
1 Parent(s): b3ee0d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -34,6 +34,62 @@ More information needed
34
 
35
  ## Training procedure
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
 
34
 
35
  ## Training procedure
36
 
37
+ ```yaml
38
+ ### model
39
+ model_name_or_path: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
40
+
41
+ ### method
42
+ stage: sft
43
+ do_train: true
44
+ finetuning_type: full
45
+ deepspeed: /root/LLaMA-Factory/examples/deepspeed/ds_z2_config.json
46
+
47
+ ### dataset
48
+ dataset: distilabel-reasoning-R1-Llama-70B-ja-train
49
+ template: qwen
50
+ cutoff_len: 4500
51
+ overwrite_cache: true
52
+ preprocessing_num_workers: 16
53
+ packing: true
54
+
55
+ ### output
56
+ output_dir: /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train
57
+ logging_steps: 1
58
+ save_steps: 0.99999
59
+ plot_loss: true
60
+ overwrite_output_dir: true
61
+
62
+ ### train
63
+ per_device_train_batch_size: 1
64
+ gradient_accumulation_steps: 1
65
+ learning_rate: 1.0e-5
66
+ num_train_epochs: 1.0
67
+ lr_scheduler_type: cosine
68
+ warmup_ratio: 0.01
69
+ bf16: true
70
+ ddp_timeout: 180000000
71
+
72
+ ### eval
73
+ val_size: 0.01
74
+ per_device_eval_batch_size: 1
75
+ eval_strategy: steps
76
+ eval_steps: 0.1
77
+ ```
78
+
79
+ ```shell
80
+ echo '{
81
+ "distilabel-reasoning-R1-Llama-70B-ja-train": {
82
+ "hf_hub_url": "lightblue/distilabel-reasoning-R1-Llama-70B-ja-train",
83
+ "formatting": "sharegpt"
84
+ }
85
+ }' > /root/LLaMA-Factory/data/dataset_info.json
86
+
87
+ cd /root/LLaMA-Factory && llamafactory-cli train /root/reasoning_train.yaml
88
+
89
+ rm -r /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train/checkpoint*
90
+ huggingface-cli upload lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train
91
+ ```
92
+
93
  ### Training hyperparameters
94
 
95
  The following hyperparameters were used during training: