Commit 899327de authored by chenych's avatar chenych
Browse files

Update README

parent 126ffe1b
......@@ -20,9 +20,11 @@ DeepSeek-V3.2,这是一个在高计算效率与卓越推理和代理性能之
| DTK | 25.04.1 |
| python | 3.10.12 |
| transformers | 4.56.1 |
| vllm | 0.9.2+das.opt1.dtk25042 |
| torch | 2.5.1+das.opt1.dtk25041 |
推荐使用镜像:
推荐使用镜像: image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-das1.7-py3.10-20251203
- 挂载地址`-v` 根据实际模型情况修改
```bash
docker run -it \
......@@ -98,13 +100,14 @@ cp inference/config.json /path/to/DeepSeek-V3.2-bf16
```bash
export VLLM_USE_V32_ENCODE=1
export ALLREDUCE_STREAM_WITH_COMPUTE=1
export VLLM_HOST_IP=x.x.x.x # 对应计算节点的IP,建议选择IB口SOCKET_IFNAME对应IP地址
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export HSA_FORCE_FINE_GRAIN_PCIE=1
export NCCL_SOCKET_IFNAME=ibxxxx
export GLOO_SOCKET_IFNAME=ibxxxx
export NCCL_IB_HCA=mlx5_0:1
export NCCL_IB_HCA=mlx5_0:1 # 根据 ibstat 查看
unset NCCL_ALGO
export NCCL_IB_DISABLE=0
export NCCL_MAX_NCHANNELS=16
......@@ -114,8 +117,6 @@ export NCCL_DEBUG=INFO
export NCCL_MIN_P2P_NCHANNELS=16
export NCCL_NCHANNELS_PER_PEER=16
export HIP_USE_GRAPH_QUEUE_POOL=1
export VLLM_ENABLE_MOE_FUSED_GATE=0
export VLLM_ENFORCE_EAGER_BS_THRESHOLD=44
export VLLM_RPC_TIMEOUT=1800000
export VLLM_USE_FLASH_MLA=1
......@@ -129,10 +130,6 @@ export VLLM_RANK4_NUMA=4
export VLLM_RANK5_NUMA=5
export VLLM_RANK6_NUMA=6
export VLLM_RANK7_NUMA=7
# BW集群需要额外设置的环境变量
export NCCL_NET_GDR_LEVEL=7
export NCCL_SDMA_COPY_ENABLE=0
```
3. 启动RAY集群
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment