Spaces:
Runtime error
Runtime error
| # π Free H200 Training: Nano-Coder on Hugging Face | |
| This guide shows you how to train a nano-coder model using **Hugging Face's free H200 GPU access** (4 minutes daily). | |
| ## π― What You Get | |
| - **Free H200 GPU**: 4 minutes per day | |
| - **No Credit Card Required**: Completely free | |
| - **Easy Setup**: Just a few clicks | |
| - **Model Sharing**: Automatic upload to HF Hub | |
| ## π Quick Start | |
| ### Option 1: Hugging Face Space (Recommended) | |
| 1. **Create HF Space:** | |
| ```bash | |
| huggingface-cli repo create nano-coder-free --type space | |
| ``` | |
| 2. **Upload Files:** | |
| - Upload all the Python files to your space | |
| - Make sure `app.py` is in the root directory | |
| 3. **Configure Space:** | |
| - Set **Hardware**: H200 (free tier) | |
| - Set **Python Version**: 3.9+ | |
| - Set **Requirements**: `requirements.txt` | |
| 4. **Launch Training:** | |
| - Go to your space URL | |
| - Click "π Start Free H200 Training" | |
| - Wait for training to complete (3.5 minutes) | |
| ### Option 2: Local Setup with HF Free Tier | |
| 1. **Install Dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Set HF Token:** | |
| ```bash | |
| export HF_TOKEN="your_token_here" | |
| ``` | |
| 3. **Run Free Training:** | |
| ```bash | |
| python hf_free_training.py | |
| ``` | |
| ## π Model Configuration (Free Tier) | |
| | Parameter | Free Tier | Full Model | | |
| |-----------|-----------|------------| | |
| | **Layers** | 6 | 12 | | |
| | **Heads** | 6 | 12 | | |
| | **Embedding** | 384 | 768 | | |
| | **Context** | 512 | 1024 | | |
| | **Parameters** | ~15M | ~124M | | |
| | **Training Time** | 3.5 min | 2-4 hours | | |
| ## β° Time Management | |
| - **Daily Limit**: 4 minutes of H200 time | |
| - **Training Time**: 3.5 minutes (safe buffer) | |
| - **Automatic Stop**: Script stops before time limit | |
| - **Daily Reset**: New 4 minutes every day at midnight UTC | |
| ## π¨ Features | |
| ### Training Features | |
| - β **Automatic Time Tracking**: Stops before limit | |
| - β **Frequent Checkpoints**: Every 200 iterations | |
| - β **HF Hub Upload**: Models saved automatically | |
| - β **Wandb Logging**: Real-time metrics | |
| - β **Progress Monitoring**: Time remaining display | |
| ### Generation Features | |
| - β **Interactive UI**: Gradio interface | |
| - β **Custom Prompts**: Any Python code start | |
| - β **Adjustable Parameters**: Temperature, tokens | |
| - β **Real-time Generation**: Instant results | |
| ## π File Structure | |
| ``` | |
| nano-coder-free/ | |
| βββ app.py # HF Space app | |
| βββ hf_free_training.py # Free H200 training script | |
| βββ prepare_code_dataset.py # Dataset preparation | |
| βββ sample_nano_coder.py # Code generation | |
| βββ requirements.txt # Dependencies | |
| βββ model.py # nanoGPT model | |
| βββ configurator.py # Configuration | |
| βββ README_free_H200.md # This file | |
| ``` | |
| ## π§ Customization | |
| ### Adjust Training Parameters | |
| Edit `hf_free_training.py`: | |
| ```python | |
| # Model size (smaller = faster training) | |
| n_layer = 4 # Even smaller | |
| n_head = 4 # Even smaller | |
| n_embd = 256 # Even smaller | |
| # Training time (be conservative) | |
| MAX_TRAINING_TIME = 3.0 * 60 # 3 minutes | |
| # Batch size (larger = faster) | |
| batch_size = 128 # If you have memory | |
| ``` | |
| ### Change Dataset | |
| ```python | |
| # In prepare_code_dataset.py | |
| dataset = load_dataset("your-dataset") # Your own dataset | |
| ``` | |
| ## π Expected Results | |
| After 3.5 minutes of training on H200: | |
| - **Training Loss**: ~2.5-3.0 | |
| - **Validation Loss**: ~2.8-3.3 | |
| - **Model Size**: ~15MB | |
| - **Code Quality**: Basic Python functions | |
| - **Iterations**: ~500-1000 | |
| ## π― Use Cases | |
| ### Perfect For: | |
| - β **Learning**: Understand nanoGPT training | |
| - β **Prototyping**: Test ideas quickly | |
| - β **Experiments**: Try different configurations | |
| - β **Small Models**: Code generation demos | |
| ### Not Suitable For: | |
| - β **Production**: Too small for real use | |
| - β **Large Models**: Limited by time/parameters | |
| - β **Long Training**: 4-minute daily limit | |
| ## π Daily Workflow | |
| 1. **Morning**: Check if you can train today | |
| 2. **Prepare**: Have your dataset ready | |
| 3. **Train**: Run 3.5-minute training session | |
| 4. **Test**: Generate some code samples | |
| 5. **Share**: Upload to HF Hub if good | |
| 6. **Wait**: Come back tomorrow for more training | |
| ## π¨ Troubleshooting | |
| ### Common Issues | |
| 1. **"Daily limit reached"** | |
| - Wait until tomorrow | |
| - Check your timezone | |
| 2. **"No GPU available"** | |
| - H200 might be busy | |
| - Try again in a few minutes | |
| 3. **"Training too slow"** | |
| - Reduce model size | |
| - Increase batch size | |
| - Use smaller context | |
| 4. **"Out of memory"** | |
| - Reduce batch_size | |
| - Reduce block_size | |
| - Reduce model size | |
| ### Performance Tips | |
| - **Batch Size**: Use largest that fits in memory | |
| - **Context Length**: 512 is good for free tier | |
| - **Model Size**: 6 layers is optimal | |
| - **Learning Rate**: 1e-3 for fast convergence | |
| ## π Monitoring | |
| ### Wandb Dashboard | |
| - Real-time loss curves | |
| - Training metrics | |
| - Model performance | |
| ### HF Hub | |
| - Model checkpoints | |
| - Training logs | |
| - Generated samples | |
| ### Local Files | |
| - `out-nano-coder-free/ckpt.pt` - Latest model | |
| - `daily_limit_YYYY-MM-DD.txt` - Usage tracking | |
| ## π Success Stories | |
| Users have achieved: | |
| - β Basic Python function generation | |
| - β Simple class definitions | |
| - β List comprehensions | |
| - β Error handling patterns | |
| - β Docstring generation | |
| ## π Resources | |
| - [Hugging Face Spaces](https://huggingface.co/spaces) | |
| - [Free GPU Access](https://huggingface.co/docs/hub/spaces-sdks-docker-gpu) | |
| - [NanoGPT Original](https://github.com/karpathy/nanoGPT) | |
| - [Python Code Dataset](https://huggingface.co/datasets/flytech/python-codes-25k) | |
| ## π€ Contributing | |
| Want to improve the free H200 setup? | |
| 1. **Optimize Model**: Make it train faster | |
| 2. **Better UI**: Improve the Gradio interface | |
| 3. **More Datasets**: Support other code datasets | |
| 4. **Documentation**: Help others get started | |
| ## π License | |
| This project follows the same license as the original nanoGPT repository. | |
| --- | |
| **Happy Free H200 Training! π** | |
| Remember: 4 minutes a day keeps the AI doctor away! π |