Model Card for Model ID
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
using the LoRA (Low-Rank Adaptation) method to adapt the base model on an enhanced version of the GSM8K dataset. The model is designed for English language causal language modeling tasks with a focus on math word problem solving.
🧠 What’s Inside
- Base Model:
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- LoRA Adaptation: Applied to
q_proj
andv_proj
layers for efficient fine-tuning. - Target Task: Causal language modeling (
TaskType.CAUSAL_LM
) on math reasoning problems. - Training Dataset:
eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
🛠️ Fine-Tuning Details
This model was fine-tuned using the transformers
and peft
libraries. The training leveraged the Parameter-Efficient Fine-Tuning (PEFT) technique using LoRA for memory efficiency and faster convergence.
LoRA Configuration
from peft import get_peft_model, LoraConfig, TaskType
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.1,
bias="none",
)
model = get_peft_model(model, lora_config)
r=8
: Low-rank dimensionalitylora_alpha=16
: Scaling factorlora_dropout=0.1
: Regularizationtarget_modules=["q_proj", "v_proj"]
: Applies LoRA to attention layersbias="none"
: No bias added to adapted layers
TrainingArguments Configuration
from transformers import TrainingArguments
training_args = TrainingArguments(
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
warmup_steps=100,
num_train_epochs=3,
learning_rate=2e-4,
fp16=True,
logging_steps=10,
output_dir="outputs",
report_to="none",
remove_unused_columns=False,
)
- Effective batch size:
1 × 8 = 8
(via gradient accumulation) - Epochs: 3
- Learning rate: 2e-4
- Mixed precision: Enabled (
fp16=True
) for speed and efficiency - Logging: Every 10 steps
💡 Intended Use
This model is intended for educational and research purposes in math word problem solving, reasoning tasks, and language modeling. It can be used as-is or further fine-tuned on domain-specific datasets.
📜 License
This model is released under the MIT License.
🚧 Limitations and Biases
While the base model and fine-tuning procedure are robust, this model inherits limitations from both the original dataset (GSM8K) and the DeepSeek architecture, including potential biases in language and reasoning patterns. Caution is advised when using the model in high-stakes or real-world applications.
Model tree for eagle0504/fine-tuned-DeepSeek-R1-Distill-Qwen-1.5B-openai-gsm8k-enhanced-v1
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B