Spaces:

mlopez6132
/

nano-coder-free

Runtime error

App Files Files Community

nano-coder-free / README_free_H200.md

mlopez6132

Upload README_free_H200.md with huggingface_hub

3ee5ebe verified 5 months ago

preview code

raw

history blame

5.98 kB

	# 🆓 Free H200 Training: Nano-Coder on Hugging Face

	This guide shows you how to train a nano-coder model using Hugging Face's free H200 GPU access (4 minutes daily).

	## 🎯 What You Get

	- Free H200 GPU: 4 minutes per day
	- No Credit Card Required: Completely free
	- Easy Setup: Just a few clicks
	- Model Sharing: Automatic upload to HF Hub

	## 🚀 Quick Start

	### Option 1: Hugging Face Space (Recommended)

	1. Create HF Space:
	```bash
	huggingface-cli repo create nano-coder-free --type space
	```

	2. Upload Files:
	- Upload all the Python files to your space
	- Make sure `app.py` is in the root directory

	3. Configure Space:
	- Set Hardware: H200 (free tier)
	- Set Python Version: 3.9+
	- Set Requirements: `requirements.txt`

	4. Launch Training:
	- Go to your space URL
	- Click "🚀 Start Free H200 Training"
	- Wait for training to complete (3.5 minutes)

	### Option 2: Local Setup with HF Free Tier

	1. Install Dependencies:
	```bash
	pip install -r requirements.txt
	```

	2. Set HF Token:
	```bash
	export HF_TOKEN="your_token_here"
	```

	3. Run Free Training:
	```bash
	python hf_free_training.py
	```

	## 📊 Model Configuration (Free Tier)

	\| Parameter \| Free Tier \| Full Model \|
	\|-----------\|-----------\|------------\|
	\| Layers \| 6 \| 12 \|
	\| Heads \| 6 \| 12 \|
	\| Embedding \| 384 \| 768 \|
	\| Context \| 512 \| 1024 \|
	\| Parameters \| ~15M \| ~124M \|
	\| Training Time \| 3.5 min \| 2-4 hours \|

	## ⏰ Time Management

	- Daily Limit: 4 minutes of H200 time
	- Training Time: 3.5 minutes (safe buffer)
	- Automatic Stop: Script stops before time limit
	- Daily Reset: New 4 minutes every day at midnight UTC

	## 🎨 Features

	### Training Features
	- ✅ Automatic Time Tracking: Stops before limit
	- ✅ Frequent Checkpoints: Every 200 iterations
	- ✅ HF Hub Upload: Models saved automatically
	- ✅ Wandb Logging: Real-time metrics
	- ✅ Progress Monitoring: Time remaining display

	### Generation Features
	- ✅ Interactive UI: Gradio interface
	- ✅ Custom Prompts: Any Python code start
	- ✅ Adjustable Parameters: Temperature, tokens
	- ✅ Real-time Generation: Instant results

	## 📁 File Structure

	```
	nano-coder-free/
	├── app.py # HF Space app
	├── hf_free_training.py # Free H200 training script
	├── prepare_code_dataset.py # Dataset preparation
	├── sample_nano_coder.py # Code generation
	├── requirements.txt # Dependencies
	├── model.py # nanoGPT model
	├── configurator.py # Configuration
	└── README_free_H200.md # This file
	```

	## 🔧 Customization

	### Adjust Training Parameters

	Edit `hf_free_training.py`:

	```python
	# Model size (smaller = faster training)
	n_layer = 4 # Even smaller
	n_head = 4 # Even smaller
	n_embd = 256 # Even smaller

	# Training time (be conservative)
	MAX_TRAINING_TIME = 3.0 * 60 # 3 minutes

	# Batch size (larger = faster)
	batch_size = 128 # If you have memory
	```

	### Change Dataset

	```python
	# In prepare_code_dataset.py
	dataset = load_dataset("your-dataset") # Your own dataset
	```

	## 📈 Expected Results

	After 3.5 minutes of training on H200:

	- Training Loss: ~2.5-3.0
	- Validation Loss: ~2.8-3.3
	- Model Size: ~15MB
	- Code Quality: Basic Python functions
	- Iterations: ~500-1000

	## 🎯 Use Cases

	### Perfect For:
	- ✅ Learning: Understand nanoGPT training
	- ✅ Prototyping: Test ideas quickly
	- ✅ Experiments: Try different configurations
	- ✅ Small Models: Code generation demos

	### Not Suitable For:
	- ❌ Production: Too small for real use
	- ❌ Large Models: Limited by time/parameters
	- ❌ Long Training: 4-minute daily limit

	## 🔄 Daily Workflow

	1. Morning: Check if you can train today
	2. Prepare: Have your dataset ready
	3. Train: Run 3.5-minute training session
	4. Test: Generate some code samples
	5. Share: Upload to HF Hub if good
	6. Wait: Come back tomorrow for more training

	## 🚨 Troubleshooting

	### Common Issues

	1. "Daily limit reached"
	- Wait until tomorrow
	- Check your timezone

	2. "No GPU available"
	- H200 might be busy
	- Try again in a few minutes

	3. "Training too slow"
	- Reduce model size
	- Increase batch size
	- Use smaller context

	4. "Out of memory"
	- Reduce batch_size
	- Reduce block_size
	- Reduce model size

	### Performance Tips

	- Batch Size: Use largest that fits in memory
	- Context Length: 512 is good for free tier
	- Model Size: 6 layers is optimal
	- Learning Rate: 1e-3 for fast convergence

	## 📊 Monitoring

	### Wandb Dashboard
	- Real-time loss curves
	- Training metrics
	- Model performance

	### HF Hub
	- Model checkpoints
	- Training logs
	- Generated samples

	### Local Files
	- `out-nano-coder-free/ckpt.pt` - Latest model
	- `daily_limit_YYYY-MM-DD.txt` - Usage tracking

	## 🎉 Success Stories

	Users have achieved:
	- ✅ Basic Python function generation
	- ✅ Simple class definitions
	- ✅ List comprehensions
	- ✅ Error handling patterns
	- ✅ Docstring generation

	## 🔗 Resources

	- [Hugging Face Spaces](https://huggingface.co/spaces)
	- [Free GPU Access](https://huggingface.co/docs/hub/spaces-sdks-docker-gpu)
	- [NanoGPT Original](https://github.com/karpathy/nanoGPT)
	- [Python Code Dataset](https://huggingface.co/datasets/flytech/python-codes-25k)

	## 🤝 Contributing

	Want to improve the free H200 setup?

	1. Optimize Model: Make it train faster
	2. Better UI: Improve the Gradio interface
	3. More Datasets: Support other code datasets
	4. Documentation: Help others get started

	## 📝 License

	This project follows the same license as the original nanoGPT repository.

	---

	Happy Free H200 Training! 🚀

	Remember: 4 minutes a day keeps the AI doctor away! 😄