Spaces:
Running
Running
File size: 4,024 Bytes
a49be3b ec19476 a49be3b ec19476 a49be3b ec19476 a49be3b ec19476 a49be3b ec19476 a49be3b ec19476 869d773 a49be3b 869d773 a49be3b ec19476 869d773 ec19476 06107a3 1170f1a 06107a3 1170f1a 06107a3 bb52925 86eaa70 bb52925 d3e7e66 8402d37 d3e7e66 a49be3b ec19476 a49be3b ec19476 a49be3b ec19476 a885118 a49be3b ec19476 a49be3b ec19476 a49be3b ec19476 a49be3b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# Pipeline Parallelism Emulation
This project provides tools for emulating and visualizing pipeline parallelism strategies used in large language model training.
## Overview
Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to:
- Simulate different pipeline parallelism strategies (1F1B, Interleaved)
- Visualize the execution schedule on multiple devices
- Compare different strategies for efficiency
## Features
- Supported Pipeline Stragegies:
- 1F1B
- Interleaved 1F1B
- Visualization:
- Interactive visualization dashboard using Plotly/Dash
- Config:
- Configurable simulation parameters through Hydra
- Each stage
## Installation
This project uses [uv](https://github.com/astral-sh/uv) for dependency management.
Setup `uv` if not installed in your computer:
```
# On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh
```
## Usage
Running for 1F1B strategy:
```bash
uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8
```

Running for interleave strategy:
```bash
uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8
```

Running for ZB-1P strategy:
```bash
uv run python main.py strategy=zb1p num_devices=4 num_stages=4 num_batches=8
```

Running for 1F1B-batch-overlap strategy:
```bash
uv run python main.py strategy=1f1b_overlap num_devices=4 num_stages=4 num_batches=8
```

Running for 1F1B-interleave-overlap strategy:
```bash
uv run python main.py strategy=1f1b_interleave_overlap num_devices=4 num_stages=8 num_batches=8
```

## Configuration
The default configuration is in `conf/config.yaml`. You can override any parameter on the command line or create configuration groups for different scenarios.
### Using Different Configuration Files
You can use different configuration files with Hydra in several ways:
#### Recommended Approach
1. Create multiple configuration files in the `conf` directory for different use cases:
```
conf/
โโโ config.yaml # Default configuration
โโโ model_A.yaml # Create your own config with stage-specific latency for performance projection.
```
2. Run with your desired configuration using the `--config-name` flag:
```bash
uv run python main.py --config-name=model_A
```
#### Override Specific Parameters
You can also override specific parameters at runtime:
```bash
uv run python main.py op_times.forward=0.5 op_times.backward=1.0 num_batches=6
```
## Project Structure
```
PP-Emulation/
โโโ conf/ # Hydra configuration files
โ โโโ config.yaml # Default configuration
โโโ src/ # Source code
โ โโโ __init__.py # Package initialization
โ โโโ execution_model.py # Schedule execution models
โ โโโ strategies.py # Pipeline parallelism strategies
โ โโโ visualizer.py # Visualization utilities
โโโ main.py # Main entry point
โโโ pyproject.toml # Project metadata and dependencies
โโโ README.md # This file
```
## Refences
1. _PipeDream: Fast and Efficient Pipeline Parallel DNN Training_. [arxiv](https://arxiv.org/abs/1806.03377)
2. _Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM_. [arxiv](https://arxiv.org/abs/2104.04473)
3. _Zero Bubble Pipeline Parallelism_ [arxiv](https://arxiv.org/abs/2401.10241)
4. ๅบไบ1F1B็MoE A2A้ไฟก่ฎก็ฎOverlap [blog](https://zhuanlan.zhihu.com/p/28463368206)
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request. |