Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
GLM-5_vllm
Commits
8bb665c7
Commit
8bb665c7
authored
Apr 08, 2026
by
chenych
Browse files
Update docker image and add glm-5-fp8.
parent
3994fb27
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
81 additions
and
21 deletions
+81
-21
README.md
README.md
+80
-20
model.properties
model.properties
+1
-1
vllm-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl
...-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl
+0
-0
No files found.
README.md
View file @
8bb665c7
...
...
@@ -6,15 +6,15 @@
作为智谱AI新一代旗舰大模型,GLM-5专注于复杂系统工程和长周期智能体任务。扩展模型规模仍是提升通用人工智能(AGI)智能效率的最重要途径之一。与 GLM-4.5 相比,GLM-5 的参数量从 355B(激活参数 32B)扩展至 744B(激活参数 40B),预训练数据量也从 23T tokens 增加到 28.5T tokens。此外,GLM-5 还集成了 DeepSeek 稀疏注意力(DSA)机制,在保持长上下文能力的同时大幅降低了部署成本。
## 环境依赖
| 软件 |
版本
|
| :------: |
:------:
|
| DTK |
26.04
.2
|
| python |
3.10.12
|
| transformers | 5.2.0
.dev0
|
| torch | 2.
5.1
+das.opt1.dtk2604.20260
116.g78471bfd
|
| vllm | 0.1
1.0
+das.opt1.
rc3
.dtk2604 |
| 软件 |
版本
|
| :------: |:------
-
:|
| DTK | 26.04 |
| python | 3.10.12 |
| transformers | 5.2.0
|
| torch |
2.
9.0
+das.opt1.dtk2604.20260
331.g4e3c1e7
|
| vllm |
0.1
5.1
+das.opt1.
alpha
.dtk2604
.torch290.2604042155.gba9f96
|
推荐使用
镜像:harbor.sourcefind.cn:5443/dcu/admin/base/
vllm:0.11.0
-ubuntu22.04-dtk26.04-
0130-py3.10-20260202
当前仅支持
镜像:
harbor.sourcefind.cn:5443/dcu/admin/base/
custom:vllm015
-ubuntu22.04-dtk26.04-
glm5-0408
-
挂载地址
`-v`
根据实际模型情况修改
...
...
@@ -33,18 +33,11 @@ docker run -it \
-u
root
\
-v
/opt/hyhal/:/opt/hyhal/:ro
\
-v
/path/your_code_data/:/path/your_code_data/
\
harbor.sourcefind.cn:5443/dcu/admin/base/
vllm:0.11.0
-ubuntu22.04-dtk26.04-
0130-py3.10-20260202
bash
harbor.sourcefind.cn:5443/dcu/admin/base/
custom:vllm015
-ubuntu22.04-dtk26.04-
glm5-0408
bash
```
更多镜像可前往
[
光源
](
https://sourcefind.cn/#/service-list
)
下载使用。
关于本项目DCU显卡所需的特殊深度学习库可从
[
光合
](
https://developer.sourcefind.cn/tool/
)
开发者社区下载安装,其它包参照requirements.txt安装:
```
pip uninstall vllm
pip install vllm-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl
pip install -r requirements.txt
```
## 数据集
`暂无`
...
...
@@ -53,6 +46,72 @@ pip install -r requirements.txt
## 推理
### vllm
#### 单机推理
1.
加入环境变量
```
bash
# 环境变量
rm
-rf
~/.cache
rm
-rf
~/.triton
export
HIP_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
ALLREDUCE_STREAM_WITH_COMPUTE
=
1
export
NCCL_MIN_NCHANNELS
=
16
export
NCCL_MAX_NCHANNELS
=
16
export
Allgather_Base_STREAM_WITH_COMPUTE
=
1
export
SENDRECV_STREAM_WITH_COMPUTE
=
1
export
HIP_KERNEL_EVENT_SYSTENFENCE
=
1
export
VLLM_RPC_TIMEOUT
=
1800000
export
VLLM_USE_PD_SPLIT
=
1
export
VLLM_USE_PIECEWISE
=
1
export
VLLM_REJECT_SAMPLE_OPT
=
1
export
USE_FUSED_RMS_QUANT
=
0
export
USE_FUSED_SILU_MUL_QUANT
=
1
export
VLLM_USE_GLOBAL_CACHE13
=
1
export
VLLM_FUSED_MOE_CHUNK_SIZE
=
16384
export
VLLM_CUSTOM_CACHE
=
1
export
VLLM_USE_OPT_CAT
=
1
export
VLLM_USE_FUSED_FILL_RMS_CAT
=
1
export
VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD
=
0
export
VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT
=
0
export
VLLM_USE_V32_ENCODE
=
1
export
VLLM_USE_FLASH_MLA
=
1
export
VLLM_DISABLE_DSA
=
0
export
USE_LIGHTOP_TOPK
=
1
export
USE_LIGHTOP_PER_TOKEN_GROUP_QUANT_FP8
=
1
export
USE_LIGHTOP_CONVERT_REQ_INDEX_TO_GLOBAL_INDEX
=
1
```
2.
启动vllm serve
```
bash
vllm serve ZhipuAI/GLM-5-FP8
\
--gpu-memory-utilization
0.925
\
--port
8001
\
--tensor-parallel-size
8
\
--tool-call-parser
glm47
\
--reasoning-parser
glm45
\
--enable-auto-tool-choice
\
--kv-cache-dtype
fp8_ds_mla
\
--served-model-name
glm-5-fp8
\
--disable-log-requests
\
--compilation-config
'{"pass_config": {"fuse_act_quant": false}}'
```
3.
启动完成后可通过以下方式访问:
```
bash
curl http://localhost:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "glm-5-fp8",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize GLM-5 in one sentence."}
],
"max_tokens": 4096,
"temperature": 0.7,
"chat_template_kwargs": {"enable_thinking": false}
}'
```
#### 多机推理
1.
加入环境变量
> 请注意:
...
...
@@ -107,9 +166,9 @@ ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=3
ray start
--address
=
'x.x.x.x:6379'
--num-gpus
=
8
--num-cpus
=
32
```
3.
启动vllm serve
r
3.
启动vllm serve
```
bash
vllm serve
zai-org
/GLM-5
\
vllm serve
ZhipuAI
/GLM-5
\
--port
8001
\
--trust-remote-code
\
--tensor-parallel-size
32
\
...
...
@@ -125,7 +184,7 @@ vllm serve zai-org/GLM-5 \
--served-model-name
glm-5
```
启动完成后可通过以下方式访问:
4.
启动完成后可通过以下方式访问:
```
bash
curl http://localhost:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
...
...
@@ -151,7 +210,8 @@ curl http://localhost:8001/v1/chat/completions \
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:|
| GLM-5 | 744B | BW1000 | 32 |
[
Hugging Face
](
https://huggingface.co/zai-org/GLM-5
)
|
| GLM-5 | 744B | BW1000 | 32 |
[
ModelScope
](
https://modelscope.cn/models/ZhipuAI/GLM-5
)
|
| GLM-5-FP8 | 744B | BW1100 | 8 |
[
ModelScope
](
https://modelscope.cn/models/ZhipuAI/GLM-5-FP8
)
|
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/glm-5_vllm
...
...
model.properties
View file @
8bb665c7
...
...
@@ -11,4 +11,4 @@ appCategory=对话问答
# 框架类型
frameType
=
vllm
# 加速卡类型
accelerateType
=
BW1000
\ No newline at end of file
accelerateType
=
BW1000,BW1100
\ No newline at end of file
vllm-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl
deleted
100644 → 0
View file @
3994fb27
File deleted
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment