"src/git@developer.sourcefind.cn:chenpangpang/open-webui.git" did not exist on "6f6be2c03f152b5c91f35b1cd7100094b8d871aa"
README.md 4.86 KB
Newer Older
raojy's avatar
updata  
raojy committed
1
2
## GLM-5
## 论文
chenych's avatar
chenych committed
3
[GLM-5: From Vibe Coding to Agentic Engineering](https://z.ai/blog/glm-5)
raojy's avatar
updata  
raojy committed
4
5

## 模型简介
chenych's avatar
chenych committed
6
作为智谱AI新一代旗舰大模型,GLM-5专注于复杂系统工程和长周期智能体任务。扩展模型规模仍是提升通用人工智能(AGI)智能效率的最重要途径之一。与 GLM-4.5 相比,GLM-5 的参数量从 355B(激活参数 32B)扩展至 744B(激活参数 40B),预训练数据量也从 23T tokens 增加到 28.5T tokens。此外,GLM-5 还集成了 DeepSeek 稀疏注意力(DSA)机制,在保持长上下文能力的同时大幅降低了部署成本。
raojy's avatar
updata  
raojy committed
7
8

## 环境依赖
chenych's avatar
chenych committed
9
10
11
12
13
14
15
| 软件 | 版本 |
| :------: | :------: |
|     DTK      |  26.04.2  |
|    python    |  3.10.12  |
| transformers |  5.2.0.dev0 |
|    torch     | 2.5.1+das.opt1.dtk2604.20260116.g78471bfd |
|     vllm     | 0.11.0+das.opt1.rc3.dtk2604 |
raojy's avatar
updata  
raojy committed
16
17

推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260202
raojy's avatar
updata  
raojy committed
18
19
20
21
22
23
24

- 挂载地址`-v`根据实际模型情况修改

```bash
docker run -it \
    --shm-size 60g \
    --network=host \
chenych's avatar
chenych committed
25
    --name glm-5 \
raojy's avatar
updata  
raojy committed
26
27
28
29
30
31
32
33
34
35
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mkfd \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -u root \
    -v /opt/hyhal/:/opt/hyhal/:ro \
    -v /path/your_code_data/:/path/your_code_data/ \
chenych's avatar
chenych committed
36
    harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260202 bash
raojy's avatar
updata  
raojy committed
37
38
39
40
41
42
```

更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。

关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装,其它包参照requirements.txt安装:
```
chenych's avatar
chenych committed
43
44
pip uninstall vllm
pip install vllm-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl
raojy's avatar
updata  
raojy committed
45
46
47
48
49
50
51
52
53
54
pip install -r requirements.txt
```

## 数据集
`暂无`

## 训练
`暂无`

## 推理
chenych's avatar
chenych committed
55
56
### vllm
#### 多机推理
raojy's avatar
updata  
raojy committed
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
1. 加入环境变量
> 请注意:
> 每个节点上的环境变量都写到.sh文件中,保存后各个计算节点分别source`.sh`文件
>
> VLLM_HOST_IP:节点本地通信口ip,尽量选择IB网卡的IP,**避免出现rccl超时问题**
>
> NCCL_SOCKET_IFNAME和 GLOO_SOCKET_IFNAME:节点本地通信网口ip对应的名称
>
> 通信口和ip查询方法:ifconfig
>
> IB口状态查询:ibstat  !!!一定要active激活状态才可用,各个节点要保持统一

```bash
export ALLREDUCE_STREAM_WITH_COMPUTE=1
export VLLM_HOST_IP=x.x.x.x # 对应计算节点的IP,选择IB口SOCKET_IFNAME对应IP地址
export NCCL_SOCKET_IFNAME=ibxxxx
export GLOO_SOCKET_IFNAME=ibxxxx
export NCCL_IB_HCA=mlx5_0:1 # 环境中的IB网卡名字
unset NCCL_ALGO
export NCCL_MIN_NCHANNELS=16
export NCCL_MAX_NCHANNELS=16
export NCCL_NET_GDR_READ=1
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export VLLM_SPEC_DECODE_EAGER=1
export VLLM_MLA_DISABLE=0
export VLLM_USE_FLASH_MLA=1

# K100_AI集群建议额外设置的环境变量:
export VLLM_ENFORCE_EAGER_BS_THRESHOLD=44
export VLLM_RPC_TIMEOUT=1800000

# 海光CPU绑定核
export VLLM_NUMA_BIND=1
export VLLM_RANK0_NUMA=0
export VLLM_RANK1_NUMA=1
export VLLM_RANK2_NUMA=2
export VLLM_RANK3_NUMA=3
export VLLM_RANK4_NUMA=4
export VLLM_RANK5_NUMA=5
export VLLM_RANK6_NUMA=6
export VLLM_RANK7_NUMA=7
```

2. 启动RAY集群
> x.x.x.x 对应第一步 VLLM_HOST_IP

```bash
# head节点执行
ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=32
# worker节点执行
ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
```

chenych's avatar
chenych committed
110
3. 启动vllm server
raojy's avatar
updata  
raojy committed
111
```bash
chenych's avatar
chenych committed
112
113
114
115
116
117
118
119
120
121
122
vllm serve zai-org/GLM-5 \
     --port 8001 \
     --trust-remote-code \
     --tensor-parallel-size 32 \
     --gpu-memory-utilization 0.85 \
     --speculative-config.method mtp \
     --speculative-config.num_speculative_tokens 1 \
     --tool-call-parser glm47 \
     --reasoning-parser glm45 \
     --enable-auto-tool-choice \
     --served-model-name glm-5
raojy's avatar
updata  
raojy committed
123
124
125
126
```

启动完成后可通过以下方式访问:
```bash
chenych's avatar
chenych committed
127
128
129
130
131
132
133
134
135
136
137
curl http://localhost:8001/v1/chat/completions   \
    -H "Content-Type: application/json"  \
    -d '{
        "model": "glm-5",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Summarize GLM-5 in one sentence."}
        ],
        "max_tokens": 4096,
        "temperature": 1
    }'
raojy's avatar
updata  
raojy committed
138
139
140
141
```

## 效果展示
<div align=center>
chenych's avatar
chenych committed
142
    <img src="./doc/xxx.png"/>
raojy's avatar
updata  
raojy committed
143
144
145
</div>

### 精度
chenych's avatar
chenych committed
146
`DCU与GPU精度一致,推理框架:vllm。`
raojy's avatar
updata  
raojy committed
147
148

## 预训练权重
chenych's avatar
chenych committed
149
150
151
| 模型名称  | 权重大小  | DCU型号  | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:|
| GLM-5 | 754B | BW1000  | 32  | [Hugging Face](https://huggingface.co/zai-org/GLM-5) |
raojy's avatar
updata  
raojy committed
152
153

## 源码仓库及问题反馈
chenych's avatar
chenych committed
154
- https://developer.sourcefind.cn/codes/modelzoo/glm-5_vllm
raojy's avatar
updata  
raojy committed
155
156

## 参考资料
chenych's avatar
chenych committed
157
- https://github.com/zai-org/GLM-5