Spaces:
Running
Running
File size: 2,682 Bytes
cfcf792 ff20385 cfcf792 b143909 cfcf792 4dfe8a5 cfcf792 ff20385 cfcf792 8c5e984 cfcf792 4dfe8a5 cfcf792 4dfe8a5 cfcf792 4dfe8a5 cfcf792 4dfe8a5 cfcf792 4dfe8a5 cfcf792 4dfe8a5 cfcf792 4dfe8a5 cfcf792 4dfe8a5 cfcf792 ff20385 4dfe8a5 cfcf792 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
title: unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit (Research Training)
emoji: 🧪
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.17.0
app_file: app.py
pinned: false
license: mit
---
# Model Fine-Tuning Project
## Overview
- **Goal**: Fine-tune unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit using pre-tokenized JSONL dataset
- **Model**: `unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit`
- **Important**: Already 4-bit quantized - do not quantize further
- **Dataset**: `phi4-cognitive-dataset`
⚠️ **RESEARCH TRAINING PHASE ONLY**: This space is being used for training purposes and does not provide interactive model outputs.
### Dataset Specs
- Entries under 2048 tokens
- Fields: `prompt_number`, `article_id`, `conversations`
- Process in ascending `prompt_number` order
- Pre-tokenized dataset - no additional tokenization needed
### Hardware
- GPU: 1x L40S (48GB VRAM)
- RAM: 62GB
- CPU: 8 cores
## Environment Variables (.env)
- `HF_TOKEN`: Hugging Face API token
- `HF_USERNAME`: Hugging Face username
- `HF_SPACE_NAME`: Target space name
## Files
### 1. `app.py`
- Training status dashboard
- No interactive model demo (research phase only)
### 2. `transformers_config.json`
- Configuration for Hugging Face Transformers
- Contains: model parameters, hardware settings, optimizer details
- Specifies pre-tokenized dataset handling
### 3. `run_cloud_training.py`
- Loads pre-tokenized dataset, sorts by `prompt_number`, initiates training
1. Load and sort JSONL by `prompt_number`
2. Use pre-tokenized input_ids directly (no tokenization)
3. Initialize with parameters from config
4. Execute training with metrics, checkpoints, error handling
- Uses Hugging Face's Trainer API with custom pre-tokenized data collator
### 4. `requirements.txt`
- Python dependencies: `transformers`, `datasets`, `torch`, etc.
- Contains unsloth for optimized training
### 5. `upload_to_space.py`
- Update model and space directly using HF API
## Implementation Notes
### Best Practices
- Dataset is pre-tokenized and sorted by `prompt_number`
- Settings stored in config file, avoiding hardcoding
- Hardware-optimized training parameters
- Gradient checkpointing and mixed precision training
- Complete logging for monitoring progress
### Model Repository
This space hosts a fine-tuned version of the [unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit) model.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|