Spaces:
Sleeping
Sleeping
| # Phi-4 Training Critical Deployment Checklist | |
| ## Essential Configuration Requirements | |
| ### 1. Model Configuration | |
| - [ ] Model name: `unsloth/phi-4-unsloth-bnb-4bit` | |
| - [ ] BF16 precision enabled, FP16 disabled | |
| - [ ] Appropriate sequence length (2048) | |
| - [ ] LoRA parameters correctly configured (r: 32, alpha: 16) | |
| ### 2. Hardware & Resource Management | |
| - [ ] Per-device batch size ≤ 16 | |
| - [ ] Gradient accumulation steps ≥ 3 | |
| - [ ] Gradient checkpointing enabled | |
| - [ ] Memory usage limits properly set (85% of GPU capacity) | |
| ### 3. Critical Dataset Handling Rules | |
| - [ ] **NO REORDERING of dataset entries** - original order must be preserved | |
| - [ ] **NO COMBINING of separate entries** - each entry must remain distinct | |
| - [ ] **SEQUENTIAL PROCESSING required** - entries must be processed one after another | |
| - [ ] `sort_by_id` and `maintain_paper_order` flags properly set to preserve data sequence | |
| - [ ] Sequential sampler used with no shuffling (`"shuffle": false`) | |
| - [ ] Dataset sequential integrity verified with validation samples | |
| - [ ] Conversation structure preserved (original format maintained) | |
| ### 4. Essential Error Handling | |
| - [ ] Clear error catching for dataset loading issues | |
| - [ ] Memory tracking at key training points | |
| - [ ] Low-verbosity logging for HF Space compatibility | |
| ### 5. Training Core Requirements | |
| - [ ] Appropriate learning rate (2e-5) | |
| - [ ] Proper checkpointing frequency | |
| - [ ] Hub settings correctly configured for model saving | |
| --- | |
| ## Pre-Deployment Verification | |
| | Requirement | Status | Notes | | |
| |-------------|--------|-------| | |
| | Data sequential integrity | | Confirm entries processed in order | | |
| | GPU memory within limits | | Check peak memory doesn't exceed 20GB per GPU | | |
| | Training batch verification | | Verify first few batches maintain proper order | | |
| --- | |
| **Current Hardware**: 4× NVIDIA L4 GPUs (24GB VRAM each) | |
| **CRITICAL REMINDER**: Data sequence preservation is the highest priority - any shuffling, reordering, or combining of entries will compromise model quality. | |
| *Last Updated: 2025-03-09* |