Spaces:
Running
Running
File size: 7,089 Bytes
d18c3a8 8b9c170 d18c3a8 8b9c170 c358966 720cbb9 c358966 8b9c170 720cbb9 8b9c170 991a47c 8b9c170 991a47c 720cbb9 9e02dae 8b9c170 720cbb9 8b9c170 720cbb9 9e02dae 8b9c170 720cbb9 9e02dae 8b9c170 991a47c 8b9c170 991a47c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
---
title: Tp 1 Dgx Node Estimator
emoji: ⚙️
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 5.34.0
app_file: app.py
pinned: false
license: mit
short_description: for NVIDIA TRDC estimation
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# 🚀 H100 Node & CUDA Version Estimator
An interactive Gradio application for estimating H100 GPU node requirements and CUDA version recommendations based on your machine learning workload specifications.
## Features
- **Comprehensive Model Support**: Supports 40+ models including:
- **Text Models**: LLaMA-2/3/3.1, Nemotron-4, Qwen2/2.5
- **Vision-Language**: Qwen-VL, Qwen2-VL, NVIDIA VILA series
- **Audio Models**: Qwen-Audio, Qwen2-Audio
- **Physics-ML**: NVIDIA PhysicsNeMo (FNO, PINN, GraphCast, SFNO)
- **Smart Estimation**: Calculates memory requirements including model weights, KV cache, and operational overhead
- **Multimodal Support**: Handles vision-language and audio-language models with specialized memory calculations
- **Use Case Optimization**: Provides different estimates for inference, training, and fine-tuning scenarios
- **Precision Support**: Handles different precision formats (FP32, FP16, BF16, INT8, INT4)
- **Interactive Visualizations**: Memory breakdown charts and node utilization graphs
- **CUDA Recommendations**: Suggests optimal CUDA versions and driver requirements
## Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd tp-1-dgx-node-estimator
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
## Usage
1. Run the application:
```bash
python app.py
```
2. Open your browser and navigate to `http://localhost:7860`
3. Configure your parameters:
- **Model**: Select from supported models (LLaMA, Nemotron, Qwen2)
- **Input Tokens**: Number of input tokens per request
- **Output Tokens**: Number of output tokens per request
- **Batch Size**: Number of concurrent requests
- **Use Case**: Choose between inference, training, or fine-tuning
- **Precision**: Select model precision/quantization level
4. Click "💡 Estimate Requirements" to get your recommendations
## Key Calculations
### Memory Estimation
- **Model Memory**: Base model weights adjusted for precision
- **KV Cache**: Calculated based on sequence length and model architecture
- **Overhead**: Use-case specific multipliers:
- Inference: 1.2x (20% overhead)
- Training: 3.0x (gradients + optimizer states)
- Fine-tuning: 2.5x (moderate overhead)
### Node Calculation
- **H100 Node**: 8 × H100 GPUs per node = 640GB HBM3 total (576GB usable per node)
- **Model Parallelism**: Automatic consideration for large models
- **Memory Efficiency**: Optimal distribution across nodes
## Example Scenarios
| Model | Tokens (In/Out) | Batch Size | Use Case | Precision | Estimated Nodes |
|-------|----------------|------------|----------|-----------|----------------|
| LLaMA-3-8B | 2048/512 | 1 | Inference | FP16 | 1 |
| LLaMA-3-70B | 4096/1024 | 4 | Inference | FP16 | 1 |
| Qwen2.5-72B | 8192/2048 | 2 | Fine-tuning | BF16 | 1 |
| Nemotron-4-340B | 2048/1024 | 1 | Inference | INT8 | 1-2 |
| Qwen2-VL-7B | 1024/256 | 1 | Inference | FP16 | 1 |
| VILA-1.5-13B | 2048/512 | 2 | Inference | BF16 | 1 |
| Qwen2-Audio-7B | 1024/256 | 1 | Inference | FP16 | 1 |
| PhysicsNeMo-FNO-Large | 512/128 | 8 | Training | FP32 | 1 |
| PhysicsNeMo-GraphCast-Medium | 1024/256 | 4 | Training | FP16 | 1 |
## CUDA Recommendations
The application provides tailored CUDA version recommendations:
- **Optimal**: CUDA 12.4 + cuDNN 8.9+
- **Recommended**: CUDA 12.1+ + cuDNN 8.7+
- **Minimum**: CUDA 11.8 + cuDNN 8.5+
## Output Features
### 📊 Detailed Analysis
- Complete memory breakdown
- Parameter counts and model specifications
- Step-by-step calculation explanation
### 🔧 CUDA Recommendations
- Version compatibility matrix
- Driver requirements
- Compute capability information
### 📈 Memory Utilization
- Visual memory breakdown (pie chart)
- Node utilization distribution (bar chart)
- Efficiency metrics
## Technical Details
### Supported Models
#### Text Models
- **LLaMA**: 2-7B, 2-13B, 2-70B, 3-8B, 3-70B, 3.1-8B, 3.1-70B, 3.1-405B
- **Nemotron**: 4-15B, 4-340B
- **Qwen2**: 0.5B, 1.5B, 7B, 72B
- **Qwen2.5**: 0.5B, 1.5B, 7B, 14B, 32B, 72B
#### Vision-Language Models
- **Qwen-VL**: Base, Chat, Plus, Max variants
- **Qwen2-VL**: 2B, 7B, 72B
- **NVIDIA VILA**: 1.5-3B, 1.5-8B, 1.5-13B, 1.5-40B
#### Audio Models
- **Qwen-Audio**: Base, Chat variants
- **Qwen2-Audio**: 7B
#### Physics-ML Models (NVIDIA PhysicsNeMo)
- **Fourier Neural Operators (FNO)**: Small (1M), Medium (10M), Large (50M)
- **Physics-Informed Neural Networks (PINN)**: Small (0.5M), Medium (5M), Large (20M)
- **GraphCast**: Small (50M), Medium (200M), Large (1B) - for weather/climate modeling
- **Spherical FNO (SFNO)**: Small (25M), Medium (100M), Large (500M) - for global simulations
### Precision Impact
- **FP32**: Full precision (4 bytes per parameter)
- **FP16/BF16**: Half precision (2 bytes per parameter)
- **INT8**: 8-bit quantization (1 byte per parameter)
- **INT4**: 4-bit quantization (0.5 bytes per parameter)
### Multimodal Considerations
- **Vision Models**: Process images as token sequences (typically 256-1024 tokens per image)
- **Audio Models**: Handle audio segments with frame-based tokenization
- **Memory Overhead**: Additional memory for vision/audio encoders and cross-modal attention
- **Token Estimation**: Consider multimodal inputs when calculating token counts
### PhysicsNeMo Considerations
- **Grid-Based Data**: Physics models work with spatial/temporal grids rather than text tokens
- **Batch Training**: Physics-ML models typically require larger batch sizes for stable training
- **Memory Patterns**: Different from LLMs - less KV cache, more gradient memory for PDE constraints
- **Precision Requirements**: Many physics simulations require FP32 for numerical stability
- **Use Cases**:
- **FNO**: Solving PDEs on regular grids (fluid dynamics, heat transfer)
- **PINN**: Physics-informed training with PDE constraints
- **GraphCast**: Weather prediction and climate modeling
- **SFNO**: Global atmospheric and oceanic simulations
## Limitations
- Estimates are approximate and may vary based on:
- Specific model implementation details
- Framework overhead (PyTorch, TensorFlow, etc.)
- Hardware configuration
- Network topology for multi-node setups
## Contributing
Feel free to submit issues and enhancement requests!
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Notes
- **Node Configuration**: Each H100 node contains 8 × H100 GPUs (640GB total memory)
- For production deployments, consider adding a 10-20% buffer to estimates
- Network bandwidth and storage requirements are not included in calculations
- Estimates assume optimal memory layout and efficient implementations
- Multi-node setups require high-speed interconnects (InfiniBand/NVLink) for optimal performance
|