Commit 16cd154d authored by one's avatar one
Browse files

[cluster] Add a readme

parent b094b3f8
# Container Cluster Launcher
## Prerequisites
- pdsh
- SSH passwordless login across nodes
Install pdsh:
```bash
wget https://github.com/chaos/pdsh/releases/download/pdsh-2.36/pdsh-2.36.tar.gz \
&& tar zxf pdsh-2.36.tar.gz \
&& cd pdsh-2.36 \
&& ./configure --without-rsh --with-ssh \
&& make -j \
&& make install
```
## Usage
Show help message:
```bash
./docker-cluster-up.sh -h
```
Edit the script first to configure the cluster.
- The master container is always the current node.
- `DOCKER_MASTER` is the hostname of the master container.
- `WORKER_CONFIG` defines a map from physical hostnames to container hostnames for worker nodes.
Run the script:
```bash
./docker-cluster-up.sh \
--image "harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260204" \
--name cluster-dtk26-20260204 \
--port 3333 \
--workdir /path/to/benchmark/dir
```
This will:
- Resolve the IP addresses of all nodes
- Pull the image on each node
- Start the container on each node
- Mount the workdir as `/workspace` in the master node container
- Create dir `/workspace` in worker node containers
- Add hostnames to `/etc/hosts`
- Listens on SSH port `3333`
\ No newline at end of file
......@@ -15,7 +15,7 @@ master node04
"
# =====================================================================
# 默认值
# 默认值和命令行参数
# =====================================================================
IMAGE_NAME=harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260204
CONTAINER_NAME=cluster-dtk26-20260204
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment