README.md 3.03 KB
Newer Older
lim's avatar
lim committed
1
2
# Deepseek-V3-0324

lim's avatar
lim committed
3
Deepseek-V3-0324 bf16四机部署步骤
lim's avatar
lim committed
4
5
6

## Table of Contents

lim's avatar
lim committed
7
[TOC]
lim's avatar
lim committed
8

lim's avatar
lim committed
9
## 1、环境准备
lim's avatar
lim committed
10

lim's avatar
lim committed
11
每个节点准备环境
lim's avatar
lim committed
12

lim's avatar
lim committed
13
14
15
```shell
docker pull  harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10
docker run --shm-size 500g --network=host --name=limeng_test2 --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v ~/:/workspace/ -v /public/opendas/DL_DATA/llm-models/:/home/models:ro -v /opt/hyhal:/opt/hyhal:ro -it harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10 bash
lim's avatar
lim committed
16
17
```

lim's avatar
lim committed
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## 2、导入环境变量

每个节点环境变量请添加到 ~/.bashrc 后重启容器

```shell
export ALLREDUCE_STREAM_WITH_COMPUTE=1
#BW集群的VLLM_HOST_IP需要设置为ib网卡对应的IP
export VLLM_HOST_IP=$(hostname -I | awk '{print $2}') 
echo $VLLM_HOST_IP
export NCCL_SOCKET_IFNAME=ib0
export GLOO_SOCKET_IFNAME=ib0
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
#这里用ibstat看看哪些网卡是active
export NCCL_IB_HCA=mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1,mlx5_9:1
export NCCL_MIN_NCHANNELS=16
export NCCL_MAX_NCHANNELS=16
export NCCL_NET_GDR_READ=1
export VLLM_RPC_TIMEOUT=1800000
export NCCL_TOPO_FILE="/workspace/limeng/topo-input.xml"

unset NCCL_ALGO
export NCCL_NET_GDR_LEVEL=7
export NCCL_SDMA_COPY_ENABLE=0

export VLLM_USE_OPT_ZEROS=1
lim's avatar
lim committed
43
44
```

lim's avatar
lim committed
45
## 3、启动ray集群
lim's avatar
lim committed
46

lim's avatar
lim committed
47
48
49
```shell
主节点执行 
ray start --head --node-ip-address=主节点ip --port=6688 --num-gpus=8 --num-cpus=32
lim's avatar
lim committed
50

lim's avatar
lim committed
51
52
53
54
子节点依次执行
ray start --address=主节点ip:6688 --node-ip-address=子节点ip --num-gpus=8 --num-cpus=32
ray start --address=主节点ip:6688 --node-ip-address=子节点ip --num-gpus=8 --num-cpus=32
ray start --address=主节点ip:6688 --node-ip-address=子节点ip --num-gpus=8 --num-cpus=32
lim's avatar
lim committed
55
56
```

lim's avatar
lim committed
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
## 4、主节点启动服务

```shell

model_path=/home/models/DeepSeek-V3-0324-bf16
model=${model_path##*/}
data_type="bfloat16"
tp=32
port=8899
gpu_memory=0.9

#日期目录
log_date=$(date "+%Y-%m-%d")
time=$(date "+%Y-%m-%d-%H-%M-%S")
log_dir="bw1000_${model}/${log_date}"
mkdir -p "${log_dir}"

vllm serve ${model_path} \
 --trust-remote-code \
 --distributed-executor-backend ray \
 --dtype $data_type \
 --tensor-parallel-size $tp \
 --gpu-memory-utilization $gpu_memory \
 --disable-cascade-attn \
 --host 0.0.0.0 \
 --port $port \
 --max-model-len 40960 \
 --max-seq-len-to-capture 40960 \
 --max-num-batched-tokens 40960 \
 --disable-log-requests \
 --max-num-seqs 1024 \
 --block-size 64 \
 --speculative_config '{"method": "deepseek_mtp", "num_speculative_tokens": 3}' \
 --enable-chunked-prefill \
 --enable-prefix-caching \
 2>&1 | tee "${log_dir}/serve_${time}.log"
 
```
lim's avatar
lim committed
95

lim's avatar
lim committed
96
在线推理
lim's avatar
lim committed
97

lim's avatar
lim committed
98
99
100
```Shell
python benchmark_serving.py --model /home/models/DeepSeek-V3-0324-bf16 --dataset-name random --trust-remote-code --random-input-len 12000 --random-output-len 550 --port 8899 --ignore-eos --max-concurrency 6 --num-prompts 12
```
lim's avatar
lim committed
101