File size: 2,960 Bytes
a49be3b
ec19476
a49be3b
ec19476
a49be3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec19476
a49be3b
 
 
 
 
 
 
ec19476
a49be3b
ec19476
a49be3b
ec19476
a49be3b
 
 
ec19476
 
a49be3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec19476
 
a49be3b
ec19476
a49be3b
 
 
 
 
 
 
 
 
 
 
 
 
ec19476
a49be3b
ec19476
a49be3b
ec19476
a49be3b
ec19476
a49be3b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# Pipeline Parallelism Emulation

This project provides tools for emulating and visualizing pipeline parallelism strategies used in large language model training.

## Overview

Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to:

- Simulate different pipeline parallelism strategies (1F1B, Interleaved)
- Visualize the execution schedule on multiple devices
- Compare different strategies for efficiency

## Features
- Supported Pipeline Stragegies:
    - 1F1B
    - Interleaved 1F1B
- Visualization:
    - Interactive visualization dashboard using Plotly/Dash
- Config:
    - Configurable simulation parameters through Hydra
    - Each stage

## Installation

This project uses [uv](https://github.com/astral-sh/uv) for dependency management.

Setup `uv` if not installed in your computer:
```
# On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh
```

## Usage

Running for 1F1B strategy:
```bash
uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8
```

```bash
uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8
```

## Configuration

The default configuration is in `conf/config.yaml`. You can override any parameter on the command line or create configuration groups for different scenarios.

### Using Different Configuration Files

You can use different configuration files with Hydra in several ways:

#### Recommended Approach

1. Create multiple configuration files in the `conf` directory for different use cases:
   ```
   conf/
   β”œβ”€β”€ config.yaml     # Default configuration
   └── model_A.yaml    # Create your own config with stage-specific latency for performance projection.
   ```

2. Run with your desired configuration using the `--config-name` flag:
   ```bash
   uv run python main.py --config-name=model_A
   ```

#### Override Specific Parameters

You can also override specific parameters at runtime:
```bash
uv run python main.py op_times.forward=0.5 op_times.backward=1.0 num_batches=6
```

## Project Structure

```
PP-Emulation/
β”œβ”€β”€ conf/                   # Hydra configuration files
β”‚   └── config.yaml         # Default configuration
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ __init__.py         # Package initialization
β”‚   β”œβ”€β”€ execution_model.py  # Schedule execution models
β”‚   β”œβ”€β”€ strategies.py       # Pipeline parallelism strategies
β”‚   └── visualizer.py       # Visualization utilities
β”œβ”€β”€ main.py                 # Main entry point
β”œβ”€β”€ pyproject.toml          # Project metadata and dependencies
└── README.md               # This file
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.