README.md 1.18 KB
Newer Older
one's avatar
one committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Container Cluster Launcher

## Prerequisites

- pdsh
- SSH passwordless login across nodes

Install pdsh:

```bash
wget https://github.com/chaos/pdsh/releases/download/pdsh-2.36/pdsh-2.36.tar.gz \
  && tar zxf pdsh-2.36.tar.gz \
  && cd pdsh-2.36 \
  && ./configure --without-rsh --with-ssh \
  && make -j \
  && make install
```

## Usage

Show help message:

```bash
./docker-cluster-up.sh -h
```

Edit the script first to configure the cluster.

- The master container is always the current node.
- `DOCKER_MASTER` is the hostname of the master container.
- `WORKER_CONFIG` defines a map from physical hostnames to container hostnames for worker nodes.

Run the script:

```bash
./docker-cluster-up.sh \
  --image "harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260204" \
  --name cluster-dtk26-20260204 \
  --port 3333 \
  --workdir /path/to/benchmark/dir
```

This will:

- Resolve the IP addresses of all nodes
- Pull the image on each node
- Start the container on each node
  - Mount the workdir as `/workspace` in the master node container
  - Create dir `/workspace` in worker node containers
  - Add hostnames to `/etc/hosts`
  - Listens on SSH port `3333`