Commit 8bb665c7 authored by chenych's avatar chenych
Browse files

Update docker image and add glm-5-fp8.

parent 3994fb27
......@@ -7,14 +7,14 @@
## 环境依赖
| 软件 | 版本 |
| :------: | :------: |
| DTK | 26.04.2 |
| :------: |:-------:|
| DTK | 26.04 |
| python | 3.10.12 |
| transformers | 5.2.0.dev0 |
| torch | 2.5.1+das.opt1.dtk2604.20260116.g78471bfd |
| vllm | 0.11.0+das.opt1.rc3.dtk2604 |
| transformers | 5.2.0 |
| torch | 2.9.0+das.opt1.dtk2604.20260331.g4e3c1e7 |
| vllm | 0.15.1+das.opt1.alpha.dtk2604.torch290.2604042155.gba9f96 |
推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260202
当前仅支持镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm015-ubuntu22.04-dtk26.04-glm5-0408
- 挂载地址`-v`根据实际模型情况修改
......@@ -33,18 +33,11 @@ docker run -it \
-u root \
-v /opt/hyhal/:/opt/hyhal/:ro \
-v /path/your_code_data/:/path/your_code_data/ \
harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260202 bash
harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm015-ubuntu22.04-dtk26.04-glm5-0408 bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装,其它包参照requirements.txt安装:
```
pip uninstall vllm
pip install vllm-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl
pip install -r requirements.txt
```
## 数据集
`暂无`
......@@ -53,6 +46,72 @@ pip install -r requirements.txt
## 推理
### vllm
#### 单机推理
1. 加入环境变量
```bash
# 环境变量
rm -rf ~/.cache
rm -rf ~/.triton
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export ALLREDUCE_STREAM_WITH_COMPUTE=1
export NCCL_MIN_NCHANNELS=16
export NCCL_MAX_NCHANNELS=16
export Allgather_Base_STREAM_WITH_COMPUTE=1
export SENDRECV_STREAM_WITH_COMPUTE=1
export HIP_KERNEL_EVENT_SYSTENFENCE=1
export VLLM_RPC_TIMEOUT=1800000
export VLLM_USE_PD_SPLIT=1
export VLLM_USE_PIECEWISE=1
export VLLM_REJECT_SAMPLE_OPT=1
export USE_FUSED_RMS_QUANT=0
export USE_FUSED_SILU_MUL_QUANT=1
export VLLM_USE_GLOBAL_CACHE13=1
export VLLM_FUSED_MOE_CHUNK_SIZE=16384
export VLLM_CUSTOM_CACHE=1
export VLLM_USE_OPT_CAT=1
export VLLM_USE_FUSED_FILL_RMS_CAT=1
export VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD=0
export VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT=0
export VLLM_USE_V32_ENCODE=1
export VLLM_USE_FLASH_MLA=1
export VLLM_DISABLE_DSA=0
export USE_LIGHTOP_TOPK=1
export USE_LIGHTOP_PER_TOKEN_GROUP_QUANT_FP8=1
export USE_LIGHTOP_CONVERT_REQ_INDEX_TO_GLOBAL_INDEX=1
```
2. 启动vllm serve
```bash
vllm serve ZhipuAI/GLM-5-FP8 \
--gpu-memory-utilization 0.925 \
--port 8001 \
--tensor-parallel-size 8 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--kv-cache-dtype fp8_ds_mla \
--served-model-name glm-5-fp8 \
--disable-log-requests \
--compilation-config '{"pass_config": {"fuse_act_quant": false}}'
```
3. 启动完成后可通过以下方式访问:
```bash
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5-fp8",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize GLM-5 in one sentence."}
],
"max_tokens": 4096,
"temperature": 0.7,
"chat_template_kwargs": {"enable_thinking": false}
}'
```
#### 多机推理
1. 加入环境变量
> 请注意:
......@@ -107,9 +166,9 @@ ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=3
ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
```
3. 启动vllm server
3. 启动vllm serve
```bash
vllm serve zai-org/GLM-5 \
vllm serve ZhipuAI/GLM-5 \
--port 8001 \
--trust-remote-code \
--tensor-parallel-size 32 \
......@@ -125,7 +184,7 @@ vllm serve zai-org/GLM-5 \
--served-model-name glm-5
```
启动完成后可通过以下方式访问:
4. 启动完成后可通过以下方式访问:
```bash
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
......@@ -151,7 +210,8 @@ curl http://localhost:8001/v1/chat/completions \
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:|
| GLM-5 | 744B | BW1000 | 32 | [Hugging Face](https://huggingface.co/zai-org/GLM-5) |
| GLM-5 | 744B | BW1000 | 32 | [ModelScope](https://modelscope.cn/models/ZhipuAI/GLM-5) |
| GLM-5-FP8 | 744B | BW1100 | 8 | [ModelScope](https://modelscope.cn/models/ZhipuAI/GLM-5-FP8) |
## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/glm-5_vllm
......
......@@ -11,4 +11,4 @@ appCategory=对话问答
# 框架类型
frameType=vllm
# 加速卡类型
accelerateType=BW1000
\ No newline at end of file
accelerateType=BW1000,BW1100
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment