Victarry commited on
Commit
f4c58ee
ยท
1 Parent(s): c224a44

Update readme.

Browse files
Files changed (1) hide show
  1. README.md +28 -22
README.md CHANGED
@@ -6,58 +6,63 @@ This project provides tools for emulating and visualizing pipeline parallelism s
6
 
7
  Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to:
8
 
9
- - Simulate different pipeline parallelism strategies (1F1B, Interleaved)
10
  - Visualize the execution schedule on multiple devices
11
  - Compare different strategies for efficiency
12
 
13
  ## Features
14
- - Supported Pipeline Stragegies:
15
- - 1F1B
16
- - Interleaved 1F1B
17
- - Visualization:
18
- - Interactive visualization dashboard using Plotly/Dash
19
- - Config:
20
- - Configurable simulation parameters through Hydra
21
- - Each stage
 
 
 
 
 
 
22
 
23
  ## Installation
24
 
25
  This project uses [uv](https://github.com/astral-sh/uv) for dependency management.
26
 
27
- Setup `uv` if not installed in your computer:
28
- ```
29
- # On macOS and Linux.
30
  curl -LsSf https://astral.sh/uv/install.sh | sh
31
  ```
32
 
33
  ## Usage
34
 
35
- Running for 1F1B strategy:
36
  ```bash
37
  uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8
38
  ```
39
  ![1f1b](assets/1f1b.png)
40
 
41
- Running for interleave strategy:
42
  ```bash
43
  uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8
44
  ```
45
  ![interleave](assets/interleave_1f1b.png)
46
 
47
- Running for ZB-1P strategy:
48
  ```bash
49
  uv run python main.py strategy=zb1p num_devices=4 num_stages=4 num_batches=8
50
  ```
51
  ![zb1p](assets/zb1p.png)
52
 
53
-
54
- Running for 1F1B-batch-overlap strategy:
55
  ```bash
56
  uv run python main.py strategy=1f1b_overlap num_devices=4 num_stages=4 num_batches=8
57
  ```
58
  ![1f1b_overlap](assets/1f1b_overlap.png)
59
 
60
- Running for 1F1B-interleave-overlap strategy:
61
  ```bash
62
  uv run python main.py strategy=1f1b_interleave_overlap num_devices=4 num_stages=8 num_batches=8
63
  ```
@@ -77,7 +82,7 @@ You can use different configuration files with Hydra in several ways:
77
  ```
78
  conf/
79
  โ”œโ”€โ”€ config.yaml # Default configuration
80
- โ””โ”€โ”€ model_A.yaml # Create your own config with stage-specific latency for performance projection.
81
  ```
82
 
83
  2. Run with your desired configuration using the `--config-name` flag:
@@ -108,11 +113,12 @@ PP-Emulation/
108
  โ””โ”€โ”€ README.md # This file
109
  ```
110
 
111
- ## Refences
 
112
  1. _PipeDream: Fast and Efficient Pipeline Parallel DNN Training_. [arxiv](https://arxiv.org/abs/1806.03377)
113
  2. _Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM_. [arxiv](https://arxiv.org/abs/2104.04473)
114
- 3. _Zero Bubble Pipeline Parallelism_ [arxiv](https://arxiv.org/abs/2401.10241)
115
- 4. ๅŸบไบŽ1F1B็š„MoE A2A้€šไฟก่ฎก็ฎ—Overlap [blog](https://zhuanlan.zhihu.com/p/28463368206)
116
 
117
  ## License
118
 
 
6
 
7
  Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to:
8
 
9
+ - Simulate different pipeline parallelism strategies (1F1B, Interleaved, Zero-Bubble, etc.)
10
  - Visualize the execution schedule on multiple devices
11
  - Compare different strategies for efficiency
12
 
13
  ## Features
14
+
15
+ - **Supported Pipeline Strategies**:
16
+ - 1F1B (One-Forward-One-Backward)
17
+ - Interleaved 1F1B
18
+ - Zero-Bubble 1F1B (ZB-1P)
19
+ - 1F1B with computation-communication overlap
20
+ - Interleaved 1F1B with computation-communication overlap
21
+
22
+ - **Visualization**:
23
+ - Interactive visualization dashboard using Plotly/Dash
24
+
25
+ - **Configuration**:
26
+ - Configurable simulation parameters through Hydra
27
+ - Customizable stage latency and communication costs
28
 
29
  ## Installation
30
 
31
  This project uses [uv](https://github.com/astral-sh/uv) for dependency management.
32
 
33
+ Setup `uv` if not installed on your computer:
34
+ ```bash
35
+ # On macOS and Linux
36
  curl -LsSf https://astral.sh/uv/install.sh | sh
37
  ```
38
 
39
  ## Usage
40
 
41
+ ### Running for 1F1B strategy:
42
  ```bash
43
  uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8
44
  ```
45
  ![1f1b](assets/1f1b.png)
46
 
47
+ ### Running for interleaved strategy:
48
  ```bash
49
  uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8
50
  ```
51
  ![interleave](assets/interleave_1f1b.png)
52
 
53
+ ### Running for ZB-1P strategy:
54
  ```bash
55
  uv run python main.py strategy=zb1p num_devices=4 num_stages=4 num_batches=8
56
  ```
57
  ![zb1p](assets/zb1p.png)
58
 
59
+ ### Running for 1F1B-batch-overlap strategy:
 
60
  ```bash
61
  uv run python main.py strategy=1f1b_overlap num_devices=4 num_stages=4 num_batches=8
62
  ```
63
  ![1f1b_overlap](assets/1f1b_overlap.png)
64
 
65
+ ### Running for 1F1B-interleave-overlap strategy:
66
  ```bash
67
  uv run python main.py strategy=1f1b_interleave_overlap num_devices=4 num_stages=8 num_batches=8
68
  ```
 
82
  ```
83
  conf/
84
  โ”œโ”€โ”€ config.yaml # Default configuration
85
+ โ””โ”€โ”€ model_A.yaml # Create your own config with stage-specific latency for performance projection
86
  ```
87
 
88
  2. Run with your desired configuration using the `--config-name` flag:
 
113
  โ””โ”€โ”€ README.md # This file
114
  ```
115
 
116
+ ## References
117
+
118
  1. _PipeDream: Fast and Efficient Pipeline Parallel DNN Training_. [arxiv](https://arxiv.org/abs/1806.03377)
119
  2. _Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM_. [arxiv](https://arxiv.org/abs/2104.04473)
120
+ 3. _Zero Bubble Pipeline Parallelism_. [arxiv](https://arxiv.org/abs/2401.10241)
121
+ 4. _Communication-Computation Overlap in MoE Training with 1F1B Pipeline Parallelism_. [blog](https://zhuanlan.zhihu.com/p/28463368206)
122
 
123
  ## License
124