Spaces:

nvidia
/

tp-1-dgx-node-estimator

Running

File size: 7,089 Bytes

---
title: Tp 1 Dgx Node Estimator
emoji: ⚙️
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 5.34.0
app_file: app.py
pinned: false
license: mit
short_description: for NVIDIA TRDC estimation
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# 🚀 H100 Node & CUDA Version Estimator

An interactive Gradio application for estimating H100 GPU node requirements and CUDA version recommendations based on your machine learning workload specifications.

## Features

- **Comprehensive Model Support**: Supports 40+ models including:
  - **Text Models**: LLaMA-2/3/3.1, Nemotron-4, Qwen2/2.5
  - **Vision-Language**: Qwen-VL, Qwen2-VL, NVIDIA VILA series
  - **Audio Models**: Qwen-Audio, Qwen2-Audio
  - **Physics-ML**: NVIDIA PhysicsNeMo (FNO, PINN, GraphCast, SFNO)
- **Smart Estimation**: Calculates memory requirements including model weights, KV cache, and operational overhead
- **Multimodal Support**: Handles vision-language and audio-language models with specialized memory calculations
- **Use Case Optimization**: Provides different estimates for inference, training, and fine-tuning scenarios
- **Precision Support**: Handles different precision formats (FP32, FP16, BF16, INT8, INT4)
- **Interactive Visualizations**: Memory breakdown charts and node utilization graphs
- **CUDA Recommendations**: Suggests optimal CUDA versions and driver requirements

## Installation

1. Clone the repository:
```bash
git clone <repository-url>
cd tp-1-dgx-node-estimator
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

## Usage

1. Run the application:
```bash
python app.py
```

2. Open your browser and navigate to `http://localhost:7860`

3. Configure your parameters:
   - **Model**: Select from supported models (LLaMA, Nemotron, Qwen2)
   - **Input Tokens**: Number of input tokens per request
   - **Output Tokens**: Number of output tokens per request
   - **Batch Size**: Number of concurrent requests
   - **Use Case**: Choose between inference, training, or fine-tuning
   - **Precision**: Select model precision/quantization level

4. Click "💡 Estimate Requirements" to get your recommendations

## Key Calculations

### Memory Estimation
- **Model Memory**: Base model weights adjusted for precision
- **KV Cache**: Calculated based on sequence length and model architecture
- **Overhead**: Use-case specific multipliers:
  - Inference: 1.2x (20% overhead)
  - Training: 3.0x (gradients + optimizer states)
  - Fine-tuning: 2.5x (moderate overhead)

### Node Calculation
- **H100 Node**: 8 × H100 GPUs per node = 640GB HBM3 total (576GB usable per node)
- **Model Parallelism**: Automatic consideration for large models
- **Memory Efficiency**: Optimal distribution across nodes

## Example Scenarios

| Model | Tokens (In/Out) | Batch Size | Use Case | Precision | Estimated Nodes |
|-------|----------------|------------|----------|-----------|----------------|
| LLaMA-3-8B | 2048/512 | 1 | Inference | FP16 | 1 |
| LLaMA-3-70B | 4096/1024 | 4 | Inference | FP16 | 1 |
| Qwen2.5-72B | 8192/2048 | 2 | Fine-tuning | BF16 | 1 |
| Nemotron-4-340B | 2048/1024 | 1 | Inference | INT8 | 1-2 |
| Qwen2-VL-7B | 1024/256 | 1 | Inference | FP16 | 1 |
| VILA-1.5-13B | 2048/512 | 2 | Inference | BF16 | 1 |
| Qwen2-Audio-7B | 1024/256 | 1 | Inference | FP16 | 1 |
| PhysicsNeMo-FNO-Large | 512/128 | 8 | Training | FP32 | 1 |
| PhysicsNeMo-GraphCast-Medium | 1024/256 | 4 | Training | FP16 | 1 |

## CUDA Recommendations

The application provides tailored CUDA version recommendations:

- **Optimal**: CUDA 12.4 + cuDNN 8.9+
- **Recommended**: CUDA 12.1+ + cuDNN 8.7+
- **Minimum**: CUDA 11.8 + cuDNN 8.5+

## Output Features

### 📊 Detailed Analysis
- Complete memory breakdown
- Parameter counts and model specifications
- Step-by-step calculation explanation

### 🔧 CUDA Recommendations
- Version compatibility matrix
- Driver requirements
- Compute capability information

### 📈 Memory Utilization
- Visual memory breakdown (pie chart)
- Node utilization distribution (bar chart)
- Efficiency metrics

## Technical Details

### Supported Models
#### Text Models
- **LLaMA**: 2-7B, 2-13B, 2-70B, 3-8B, 3-70B, 3.1-8B, 3.1-70B, 3.1-405B
- **Nemotron**: 4-15B, 4-340B
- **Qwen2**: 0.5B, 1.5B, 7B, 72B
- **Qwen2.5**: 0.5B, 1.5B, 7B, 14B, 32B, 72B

#### Vision-Language Models
- **Qwen-VL**: Base, Chat, Plus, Max variants
- **Qwen2-VL**: 2B, 7B, 72B
- **NVIDIA VILA**: 1.5-3B, 1.5-8B, 1.5-13B, 1.5-40B

#### Audio Models
- **Qwen-Audio**: Base, Chat variants
- **Qwen2-Audio**: 7B

#### Physics-ML Models (NVIDIA PhysicsNeMo)
- **Fourier Neural Operators (FNO)**: Small (1M), Medium (10M), Large (50M)
- **Physics-Informed Neural Networks (PINN)**: Small (0.5M), Medium (5M), Large (20M)
- **GraphCast**: Small (50M), Medium (200M), Large (1B) - for weather/climate modeling
- **Spherical FNO (SFNO)**: Small (25M), Medium (100M), Large (500M) - for global simulations

### Precision Impact
- **FP32**: Full precision (4 bytes per parameter)
- **FP16/BF16**: Half precision (2 bytes per parameter)
- **INT8**: 8-bit quantization (1 byte per parameter)
- **INT4**: 4-bit quantization (0.5 bytes per parameter)

### Multimodal Considerations
- **Vision Models**: Process images as token sequences (typically 256-1024 tokens per image)
- **Audio Models**: Handle audio segments with frame-based tokenization
- **Memory Overhead**: Additional memory for vision/audio encoders and cross-modal attention
- **Token Estimation**: Consider multimodal inputs when calculating token counts

### PhysicsNeMo Considerations
- **Grid-Based Data**: Physics models work with spatial/temporal grids rather than text tokens
- **Batch Training**: Physics-ML models typically require larger batch sizes for stable training
- **Memory Patterns**: Different from LLMs - less KV cache, more gradient memory for PDE constraints
- **Precision Requirements**: Many physics simulations require FP32 for numerical stability
- **Use Cases**: 
  - **FNO**: Solving PDEs on regular grids (fluid dynamics, heat transfer)
  - **PINN**: Physics-informed training with PDE constraints
  - **GraphCast**: Weather prediction and climate modeling
  - **SFNO**: Global atmospheric and oceanic simulations

## Limitations

- Estimates are approximate and may vary based on:
  - Specific model implementation details
  - Framework overhead (PyTorch, TensorFlow, etc.)
  - Hardware configuration
  - Network topology for multi-node setups

## Contributing

Feel free to submit issues and enhancement requests!

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Notes

- **Node Configuration**: Each H100 node contains 8 × H100 GPUs (640GB total memory)
- For production deployments, consider adding a 10-20% buffer to estimates
- Network bandwidth and storage requirements are not included in calculations
- Estimates assume optimal memory layout and efficient implementations
- Multi-node setups require high-speed interconnects (InfiniBand/NVLink) for optimal performance