lightblue
/

DeepSeek-R1-Distill-Qwen-7B-Japanese

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

ptrdvn commited on Jan 24

Commit

d605834

·

verified ·

1 Parent(s): b3ee0d4

Update README.md

Files changed (1) hide show

README.md +56 -0

README.md CHANGED Viewed

@@ -34,6 +34,62 @@ More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:

 ## Training procedure
+```yaml
+### model
+model_name_or_path: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+deepspeed: /root/LLaMA-Factory/examples/deepspeed/ds_z2_config.json
+### dataset
+dataset: distilabel-reasoning-R1-Llama-70B-ja-train
+template: qwen
+cutoff_len: 4500
+overwrite_cache: true
+preprocessing_num_workers: 16
+packing: true
+### output
+output_dir: /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train
+logging_steps: 1
+save_steps: 0.99999
+plot_loss: true
+overwrite_output_dir: true
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 1
+learning_rate: 1.0e-5
+num_train_epochs: 1.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.01
+bf16: true
+ddp_timeout: 180000000
+### eval
+val_size: 0.01
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 0.1
+```
+```shell
+echo '{
+  "distilabel-reasoning-R1-Llama-70B-ja-train": {
+    "hf_hub_url": "lightblue/distilabel-reasoning-R1-Llama-70B-ja-train",
+    "formatting": "sharegpt"
+  }
+}' > /root/LLaMA-Factory/data/dataset_info.json
+cd /root/LLaMA-Factory && llamafactory-cli train /root/reasoning_train.yaml
+rm -r /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train/checkpoint*
+huggingface-cli upload lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train
+```
 ### Training hyperparameters
 The following hyperparameters were used during training: