Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
GLM-5_vllm
Commits
fbac2cd8
Commit
fbac2cd8
authored
Apr 29, 2026
by
chenych
Browse files
优化README描述
parent
85ca2fc1
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
37 additions
and
17 deletions
+37
-17
README.md
README.md
+37
-17
No files found.
README.md
View file @
fbac2cd8
...
@@ -16,7 +16,7 @@
...
@@ -16,7 +16,7 @@
| sglang | 0.5.10rc0 |
| sglang | 0.5.10rc0 |
当前仅支持镜像:
当前仅支持镜像:
-
**vLLM推理请使用:**
harbor.sourcefind.cn:5443/dcu/admin/base/
custom:vllm015
-ubuntu22.04-dtk26.04-
glm5-
040
8
-
**vLLM推理请使用:**
docker pull
harbor.sourcefind.cn:5443/dcu/admin/base/
vllm:0.15.1-
-ubuntu22.04-dtk26.04-
py3.10-2026
040
9
-
**SGLang推理请使用:**
harbor.sourcefind.cn:5443/dcu/admin/base/custom:sglang-0.5.10-glm5-0416
-
**SGLang推理请使用:**
harbor.sourcefind.cn:5443/dcu/admin/base/custom:sglang-0.5.10-glm5-0416
-
挂载地址
`-v`
根据实际模型情况修改
-
挂载地址
`-v`
根据实际模型情况修改
...
@@ -37,11 +37,20 @@ docker run -it \
...
@@ -37,11 +37,20 @@ docker run -it \
-u
root
\
-u
root
\
-v
/opt/hyhal/:/opt/hyhal/:ro
\
-v
/opt/hyhal/:/opt/hyhal/:ro
\
-v
/path/your_code_data/:/path/your_code_data/
\
-v
/path/your_code_data/:/path/your_code_data/
\
harbor.sourcefind.cn:5443/dcu/admin/base/
custom:vllm015
-ubuntu22.04-dtk26.04-
glm5-
040
8
bash
docker pull
harbor.sourcefind.cn:5443/dcu/admin/base/
vllm:0.15.1-
-ubuntu22.04-dtk26.04-
py3.10-2026
040
9
bash
```
```
更多镜像可前往
[
光源
](
https://sourcefind.cn/#/service-list
)
下载使用。
更多镜像可前往
[
光源
](
https://sourcefind.cn/#/service-list
)
下载使用。
## 预训练权重
**请根据`支持的DCU型号`选择对应模型下载,FP8模型仅在BW1100/BW1101上支持,其他型号请勿使用!**
| 模型名称 | 权重大小 | 数据类型 | 支持的DCU型号 | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:----------:|:---------------------:|:----------:|
| GLM-5 | 744B | BF16 | BW1000 | 32 |
[
ModelScope
](
https://modelscope.cn/models/ZhipuAI/GLM-5
)
|
| GLM-5 | 744B | BF16 | BW1100 | 16 |
[
ModelScope
](
https://modelscope.cn/models/ZhipuAI/GLM-5
)
|
| GLM-5-FP8 | 744B | FP8 | BW1100 | 8 |
[
ModelScope
](
https://modelscope.cn/models/ZhipuAI/GLM-5-FP8
)
|
## 数据集
## 数据集
`暂无`
`暂无`
...
@@ -49,7 +58,9 @@ docker run -it \
...
@@ -49,7 +58,9 @@ docker run -it \
`暂无`
`暂无`
## 推理
## 推理
> 如果出现`ImportError: librocm_smi64.so.2: cannot open shaned object file: No such file or directory`报错,系机器hyhal版本较低所致,请进行升级。
> 1. 如果出现`ImportError: librocm_smi64.so.2: cannot open shaned object file: No such file or directory`报错,系机器hyhal版本较低所致,请进行升级。
> 2. FP8模型仅在BW1100上支持,其他型号请使用BF16模型
> 3. MTP暂不支持
### SGLang
### SGLang
1.
加入环境变量
1.
加入环境变量
...
@@ -60,15 +71,15 @@ export SGLANG_ROCM_USE_AITER_MOE=0
...
@@ -60,15 +71,15 @@ export SGLANG_ROCM_USE_AITER_MOE=0
```
```
2.
启动服务
2.
启动服务
```
bash
```
bash
model_path
=
ZhipuAI/GLM-5-FP8
model_path
=
ZhipuAI/GLM-5-FP8
# FP8模型
option
=
"--numa-node 0 0 0 0 1 1 1 1 "
option
=
"--numa-node 0 0 0 0 1 1 1 1 "
option+
=
" --disable-radix-cache "
option+
=
" --disable-radix-cache "
option+
=
" --chunked-prefill-size 16384"
option+
=
" --chunked-prefill-size 16384"
option+
=
" --page-size 64 "
option+
=
" --page-size 64 "
option+
=
" --nsa-prefill-backend flashmla_auto --nsa-decode-backend flashmla_kv "
option+
=
" --nsa-prefill-backend flashmla_auto --nsa-decode-backend flashmla_kv "
# option+=" --quantization slimquant_marlin "
python3
-m
sglang.launch_server
--model-path
"
${
model_path
}
"
${
option
}
\
python3
-m
sglang.launch_server
--model-path
"
${
model_path
}
"
${
option
}
\
--trust-remote-code
\
--trust-remote-code
\
...
@@ -100,7 +111,7 @@ curl http://localhost:8001/v1/chat/completions \
...
@@ -100,7 +111,7 @@ curl http://localhost:8001/v1/chat/completions \
}'
}'
```
```
### v
llm
### v
LLM
#### 单机推理
#### 单机推理
1.
加入环境变量
1.
加入环境变量
```
bash
```
bash
...
@@ -136,6 +147,23 @@ export USE_LIGHTOP_CONVERT_REQ_INDEX_TO_GLOBAL_INDEX=1
...
@@ -136,6 +147,23 @@ export USE_LIGHTOP_CONVERT_REQ_INDEX_TO_GLOBAL_INDEX=1
```
```
2.
启动vllm serve
2.
启动vllm serve
-
**BF16**
模型
```
bash
vllm serve ZhipuAI/GLM-5
\
--port
8001
\
--trust-remote-code
\
--tensor-parallel-size
32
\
# BW1000是32, BW1100是16
--gpu-memory-utilization
0.85
\
--distributed-executor-backend
ray
\
--dtype
bfloat16
\
--max-model-len
32768
\
--tool-call-parser
glm47
\
--reasoning-parser
glm45
\
--enable-auto-tool-choice
\
--served-model-name
glm-5
```
-
**F8**
模型,
**FP8模型仅在BW1100/BW1101上支持,其他型号请使用BF16模型**
```
bash
```
bash
vllm serve ZhipuAI/GLM-5-FP8
\
vllm serve ZhipuAI/GLM-5-FP8
\
--gpu-memory-utilization
0.925
\
--gpu-memory-utilization
0.925
\
...
@@ -155,7 +183,7 @@ vllm serve ZhipuAI/GLM-5-FP8 \
...
@@ -155,7 +183,7 @@ vllm serve ZhipuAI/GLM-5-FP8 \
curl http://localhost:8001/v1/chat/completions
\
curl http://localhost:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-H
"Content-Type: application/json"
\
-d
'{
-d
'{
"model": "glm-5-fp8",
"model": "glm-5
或者 glm-5
-fp8",
"messages": [
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize GLM-5 in one sentence."}
{"role": "user", "content": "Summarize GLM-5 in one sentence."}
...
@@ -221,6 +249,7 @@ ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
...
@@ -221,6 +249,7 @@ ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
```
```
3.
启动vllm serve
3.
启动vllm serve
-
**BF16**
模型
```
bash
```
bash
vllm serve ZhipuAI/GLM-5
\
vllm serve ZhipuAI/GLM-5
\
--port
8001
\
--port
8001
\
...
@@ -230,8 +259,6 @@ vllm serve ZhipuAI/GLM-5 \
...
@@ -230,8 +259,6 @@ vllm serve ZhipuAI/GLM-5 \
--distributed-executor-backend
ray
\
--distributed-executor-backend
ray
\
--dtype
bfloat16
\
--dtype
bfloat16
\
--max-model-len
32768
\
--max-model-len
32768
\
--speculative-config
.method mtp
\
--speculative-config
.num_speculative_tokens 1
\
--tool-call-parser
glm47
\
--tool-call-parser
glm47
\
--reasoning-parser
glm45
\
--reasoning-parser
glm45
\
--enable-auto-tool-choice
\
--enable-auto-tool-choice
\
...
@@ -259,14 +286,7 @@ curl http://localhost:8001/v1/chat/completions \
...
@@ -259,14 +286,7 @@ curl http://localhost:8001/v1/chat/completions \
</div>
</div>
### 精度
### 精度
`DCU与GPU精度一致,推理框架:vllm。`
`DCU与GPU精度一致,推理框架:vllm, sglang。`
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:|
| GLM-5 | 744B | BW1000 | 32 |
[
ModelScope
](
https://modelscope.cn/models/ZhipuAI/GLM-5
)
|
| GLM-5 | 744B | BW1100 | 16 |
[
ModelScope
](
https://modelscope.cn/models/ZhipuAI/GLM-5
)
|
| GLM-5-FP8 | 744B | BW1100 | 8 |
[
ModelScope
](
https://modelscope.cn/models/ZhipuAI/GLM-5-FP8
)
|
## 源码仓库及问题反馈
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/glm-5_vllm
-
https://developer.sourcefind.cn/codes/modelzoo/glm-5_vllm
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment