File size: 2,549 Bytes
a01ef8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# Docker
Follow these instructions to set up and run our provided Docker image.

## Set Up Docker Engine and Docker Compose
You'll need to install Docker Engine on your development system. Note that while **Docker Engine** is free to use, **Docker Desktop** may require you to purchase a license. See the [Docker Engine Server installation instructions](https://docs.docker.com/engine/install/#server) for details.

To build and run this workload inside a Docker Container, ensure you have Docker Compose installed on your machine. If you don't have this tool installed, consult the official [Docker Compose installation documentation](https://docs.docker.com/compose/install/linux/#install-the-plugin-manually).

```bash
DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker}
mkdir -p $DOCKER_CONFIG/cli-plugins
curl -SL https://github.com/docker/compose/releases/download/v2.7.0/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose
chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
docker compose version
```

## Set Up Docker Image
Build or Pull the provided docker images.

```bash
cd docker
docker compose build
```
OR
```bash
docker pull intel/ai-tools:tlt-0.5.0
docker pull intel/ai-tools:tlt-devel-0.5.0
docker pull intel/ai-tools:tlt-dist-0.5.0
docker pull intel/ai-tools:tlt-dist-devel-0.5.0
```

## Use Docker Image
Utilize the TLT CLI without installation by using the provided docker image and docker compose.

```bash
docker compose run tlt-prod
# OR
docker compose run tlt-prod tlt --help
```

## Kubernetes
### 1. Install Helm
- Install [Helm](https://helm.sh/docs/intro/install/)
```bash
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 && \
chmod 700 get_helm.sh && \
./get_helm.sh
```
### 2. Setting up Training Operator
Install the standalone operator from GitHub or use a pre-existing Kubeflow configuration.
```bash
kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone"
```
OR
```bash
helm repo add cowboysysop https://cowboysysop.github.io/charts/
helm install <release name> cowboysysop/training-operator
```
### 3. Deploy TLT Distributed Job
For more customization information, see the chart [README](./docker/chart/README.md)
```bash
export NAMESPACE=kubeflow
helm install --namespace ${NAMESPACE} --set ... tlt-distributed ./docker/chart
```
### 4. View 
To view your workflow progress
```bash
kubectl get -o yaml mpijob tf-tlt-distributed -n ${NAMESPACE}
```
OR
```bash
kubectl logs tf-tlt-distributed-launcher -n ${NAMESPACE}
```