Spaces:
Running
Running
Update readme.
Browse files
README.md
CHANGED
@@ -6,58 +6,63 @@ This project provides tools for emulating and visualizing pipeline parallelism s
|
|
6 |
|
7 |
Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to:
|
8 |
|
9 |
-
- Simulate different pipeline parallelism strategies (1F1B, Interleaved)
|
10 |
- Visualize the execution schedule on multiple devices
|
11 |
- Compare different strategies for efficiency
|
12 |
|
13 |
## Features
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
-
|
18 |
-
|
19 |
-
-
|
20 |
-
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## Installation
|
24 |
|
25 |
This project uses [uv](https://github.com/astral-sh/uv) for dependency management.
|
26 |
|
27 |
-
Setup `uv` if not installed
|
28 |
-
```
|
29 |
-
# On macOS and Linux
|
30 |
curl -LsSf https://astral.sh/uv/install.sh | sh
|
31 |
```
|
32 |
|
33 |
## Usage
|
34 |
|
35 |
-
Running for 1F1B strategy:
|
36 |
```bash
|
37 |
uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8
|
38 |
```
|
39 |

|
40 |
|
41 |
-
Running for
|
42 |
```bash
|
43 |
uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8
|
44 |
```
|
45 |

|
46 |
|
47 |
-
Running for ZB-1P strategy:
|
48 |
```bash
|
49 |
uv run python main.py strategy=zb1p num_devices=4 num_stages=4 num_batches=8
|
50 |
```
|
51 |

|
52 |
|
53 |
-
|
54 |
-
Running for 1F1B-batch-overlap strategy:
|
55 |
```bash
|
56 |
uv run python main.py strategy=1f1b_overlap num_devices=4 num_stages=4 num_batches=8
|
57 |
```
|
58 |

|
59 |
|
60 |
-
Running for 1F1B-interleave-overlap strategy:
|
61 |
```bash
|
62 |
uv run python main.py strategy=1f1b_interleave_overlap num_devices=4 num_stages=8 num_batches=8
|
63 |
```
|
@@ -77,7 +82,7 @@ You can use different configuration files with Hydra in several ways:
|
|
77 |
```
|
78 |
conf/
|
79 |
โโโ config.yaml # Default configuration
|
80 |
-
โโโ model_A.yaml # Create your own config with stage-specific latency for performance projection
|
81 |
```
|
82 |
|
83 |
2. Run with your desired configuration using the `--config-name` flag:
|
@@ -108,11 +113,12 @@ PP-Emulation/
|
|
108 |
โโโ README.md # This file
|
109 |
```
|
110 |
|
111 |
-
##
|
|
|
112 |
1. _PipeDream: Fast and Efficient Pipeline Parallel DNN Training_. [arxiv](https://arxiv.org/abs/1806.03377)
|
113 |
2. _Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM_. [arxiv](https://arxiv.org/abs/2104.04473)
|
114 |
-
3. _Zero Bubble Pipeline Parallelism_ [arxiv](https://arxiv.org/abs/2401.10241)
|
115 |
-
4.
|
116 |
|
117 |
## License
|
118 |
|
|
|
6 |
|
7 |
Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to:
|
8 |
|
9 |
+
- Simulate different pipeline parallelism strategies (1F1B, Interleaved, Zero-Bubble, etc.)
|
10 |
- Visualize the execution schedule on multiple devices
|
11 |
- Compare different strategies for efficiency
|
12 |
|
13 |
## Features
|
14 |
+
|
15 |
+
- **Supported Pipeline Strategies**:
|
16 |
+
- 1F1B (One-Forward-One-Backward)
|
17 |
+
- Interleaved 1F1B
|
18 |
+
- Zero-Bubble 1F1B (ZB-1P)
|
19 |
+
- 1F1B with computation-communication overlap
|
20 |
+
- Interleaved 1F1B with computation-communication overlap
|
21 |
+
|
22 |
+
- **Visualization**:
|
23 |
+
- Interactive visualization dashboard using Plotly/Dash
|
24 |
+
|
25 |
+
- **Configuration**:
|
26 |
+
- Configurable simulation parameters through Hydra
|
27 |
+
- Customizable stage latency and communication costs
|
28 |
|
29 |
## Installation
|
30 |
|
31 |
This project uses [uv](https://github.com/astral-sh/uv) for dependency management.
|
32 |
|
33 |
+
Setup `uv` if not installed on your computer:
|
34 |
+
```bash
|
35 |
+
# On macOS and Linux
|
36 |
curl -LsSf https://astral.sh/uv/install.sh | sh
|
37 |
```
|
38 |
|
39 |
## Usage
|
40 |
|
41 |
+
### Running for 1F1B strategy:
|
42 |
```bash
|
43 |
uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8
|
44 |
```
|
45 |

|
46 |
|
47 |
+
### Running for interleaved strategy:
|
48 |
```bash
|
49 |
uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8
|
50 |
```
|
51 |

|
52 |
|
53 |
+
### Running for ZB-1P strategy:
|
54 |
```bash
|
55 |
uv run python main.py strategy=zb1p num_devices=4 num_stages=4 num_batches=8
|
56 |
```
|
57 |

|
58 |
|
59 |
+
### Running for 1F1B-batch-overlap strategy:
|
|
|
60 |
```bash
|
61 |
uv run python main.py strategy=1f1b_overlap num_devices=4 num_stages=4 num_batches=8
|
62 |
```
|
63 |

|
64 |
|
65 |
+
### Running for 1F1B-interleave-overlap strategy:
|
66 |
```bash
|
67 |
uv run python main.py strategy=1f1b_interleave_overlap num_devices=4 num_stages=8 num_batches=8
|
68 |
```
|
|
|
82 |
```
|
83 |
conf/
|
84 |
โโโ config.yaml # Default configuration
|
85 |
+
โโโ model_A.yaml # Create your own config with stage-specific latency for performance projection
|
86 |
```
|
87 |
|
88 |
2. Run with your desired configuration using the `--config-name` flag:
|
|
|
113 |
โโโ README.md # This file
|
114 |
```
|
115 |
|
116 |
+
## References
|
117 |
+
|
118 |
1. _PipeDream: Fast and Efficient Pipeline Parallel DNN Training_. [arxiv](https://arxiv.org/abs/1806.03377)
|
119 |
2. _Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM_. [arxiv](https://arxiv.org/abs/2104.04473)
|
120 |
+
3. _Zero Bubble Pipeline Parallelism_. [arxiv](https://arxiv.org/abs/2401.10241)
|
121 |
+
4. _Communication-Computation Overlap in MoE Training with 1F1B Pipeline Parallelism_. [blog](https://zhuanlan.zhihu.com/p/28463368206)
|
122 |
|
123 |
## License
|
124 |
|