First commit

a05560f3 · chenych · a05560f3 · a05560f3 · a05560f3 · a05560f3
Commit a05560f3 authored Oct 11, 2025 by chenych
9 changed files
--- a/Contributors.md
+++ b/Contributors.md
+# Contributors
+None
\ No newline at end of file
--- a/LICENSE
+++ b/LICENSE
+MIT License
+Copyright (c) 2025 DeepSeek
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
+# GLM-4.6
+## 论文
+[GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models](https://arxiv.org/abs/2508.06471)
+GLM 4.6技术报告与 4.5 一致
+## 模型结构
+GLM-4.6 是智谱最新的旗舰模型，其总参数量 355B，激活参数 32B。GLM-4.6 所有核心能力上均完成了对 GLM-4.5 的超越，具体如下：
+高级编码能力：在公开基准与真实编程任务中，GLM-4.6 的代码能力对齐 Claude Sonnet 4，是国内已知的最好的 Coding 模型。
+- **上下文长度：**上下文窗口由 128K→200K，适应更长的代码和智能体任务。
+- **推理能力：**推理能力提升，并支持在推理过程中调用工具。
+- **搜索能力：**增强了模型在工具调用和搜索智能体上的表现，在智能体框架中表现更好。
+- **写作能力：**在文风、可读性与角色扮演场景中更符合人类偏好。
+- **多语言翻译：**进一步增强跨语种任务的处理效果。
+<div align=center>
+    <img src="./doc/model.png"/>
+</div>
+## 算法原理
+<div align=center>
+    <img src="./doc/method.png"/>
+</div>
+## 环境配置
+### 硬件需求
+DCU型号：K100AI,节点数量：4台,卡数：32 张。
+`-v 路径`、`docker_name`和`imageID`根据实际情况修改
+### Docker（方法一）
+```bash
+docker pull image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.1-rc5-rocblas104381-0915-das1.6-py3.10-20250916-rc2
+docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
+cd /your_code_path/glm-4.6_vllm
+```
+### Dockerfile（方法二）
+```bash
+cd docker
+docker build --no-cache -t glm4.6:latest .
+docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
+cd /your_code_path/glm-4.6_vllm
+```
+### Anaconda（方法三）
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
+```bash
+DTK: 25.04.1
+python: 3.10.12
+torch: 2.5.1+das.opt1.dtk25041
+transformers: 4.56.1
+vllm: 0.9.2+das.opt1.rc2.dtk25041
+```
+`Tips：以上dtk驱动、pytorch等DCU相关工具版本需要严格一一对应`.
+## 数据集
+无
+## 训练
+暂无
+## 推理
+样例模型：[GLM-4.6](https://huggingface.co/zai-org/GLM-4.6)
+### vllm推理方法
+#### server 多机
+1. 加入环境变量
+> 请注意：
+> 每个节点上的环境变量都写到.sh文件中，保存后各个计算节点分别source `.sh` 文件
+>
+> VLLM_HOST_IP：节点本地通信口ip，尽量选择IB网卡的IP，**避免出现rccl超时问题**
+>
+> NCCL_SOCKET_IFNAME和GLOO_SOCKET_IFNAME：节点本地通信网口ip对应的名称
+>
+> 通信口和ip查询方法：ifconfig
+>
+> IB口状态查询：ibstat  !!!一定要active激活状态才可用，各个节点要保持统一
+<div align=center>
+    <img src="./doc/ip_bw.png"/>
+</div>
+```bash
+export ALLREDUCE_STREAM_WITH_COMPUTE=1
+export VLLM_HOST_IP=x.x.x.x # 对应计算节点的IP，建议选择IB口SOCKET_IFNAME对应IP地址
+export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export HSA_FORCE_FINE_GRAIN_PCIE=1
+export NCCL_SOCKET_IFNAME=ibxxxx
+export GLOO_SOCKET_IFNAME=ibxxxx
+export NCCL_IB_HCA=mlx5_0:1
+unset NCCL_ALGO
+export NCCL_IB_DISABLE=0
+export NCCL_MAX_NCHANNELS=16
+export NCCL_MIN_NCHANNELS=16
+export NCCL_NET_GDR_READ=1
+export NCCL_DEBUG=INFO
+export NCCL_MIN_P2P_NCHANNELS=16
+export NCCL_NCHANNELS_PER_PEER=16
+export HIP_USE_GRAPH_QUEUE_POOL=1
+export VLLM_ENABLE_MOE_FUSED_GATE=0
+export VLLM_ENFORCE_EAGER_BS_THRESHOLD=44
+export VLLM_RPC_TIMEOUT=1800000
+export VLLM_USE_FLASH_MLA=1
+# 海光CPU绑定核,intel cpu可不加
+export VLLM_NUMA_BIND=1
+export VLLM_RANK0_NUMA=0
+export VLLM_RANK1_NUMA=1
+export VLLM_RANK2_NUMA=2
+export VLLM_RANK3_NUMA=3
+export VLLM_RANK4_NUMA=4
+export VLLM_RANK5_NUMA=5
+export VLLM_RANK6_NUMA=6
+export VLLM_RANK7_NUMA=7
+#BW集群需要额外设置的环境变量
+export NCCL_NET_GDR_LEVEL=7
+export NCCL_SDMA_COPY_ENABLE=0
+```
+2. 启动RAY集群
+> x.x.x.x 对应第一步 Master节点的 VLLM_HOST_IP
+```bash
+# head节点执行
+ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=32
+# worker节点执行
+ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
+```
+3. 启动vllm server
+> intel cpu 需要加参数：`--enforce-eager`
+```bash
+vllm zai-org/GLM-4.6 \
+    --enforce-eager \
+    --trust-remote-code \
+    --distributed-executor-backend ray \
+    --dtype bfloat16 \
+    --tensor-parallel-size 32 \
+    --max-model-len 32768 \
+    --block-size 64 \
+    --no-enable-chunked-prefill \
+    --no-enable-prefix-caching \
+    --port 8001
+```
+启动完成后可通过以下方式访问：
+```bash
+curl http://127.0.0.1:8001/v1/chat/completions   \
+    -H "Content-Type: application/json"  \
+    -d '{
+        "model": "zai-org/GLM-4.6",
+        "messages": [
+            {
+                "role": "user",
+                "content": "请介绍下你自己。"
+            }
+        ],
+        "max_tokens": 1024,
+        "temperature": 0.7
+    }'
+```
+## result
+<div align=center>
+    <img src="./doc/results_dcu.png"/>
+</div>
+### 精度
+DCU与GPU精度一致，推理框架：vllm。
+## 应用场景
+### 算法类别
+`对话问答`
+### 热点应用行业
+`制造,金融,教育,广媒`
+## 预训练权重
+- [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6)
+## 源码仓库及问题反馈
+- https://developer.sourcefind.cn/codes/modelzoo/glm-4.6_vllm
+## 参考资料
+- https://z.ai/blog/glm-4.6
+- https://github.com/zai-org/GLM-4.5
--- a/doc/ip_bw.png
+++ b/doc/ip_bw.png
--- a/doc/method.png
+++ b/doc/method.png
--- a/doc/results_dcu.png
+++ b/doc/results_dcu.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.1-rc5-rocblas104381-0915-das1.6-py3.10-20250916-rc2
\ No newline at end of file
--- a/icon.png
+++ b/icon.png
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode=1767
+# 模型名称
+modelName=glm-4.6_vllm
+# 模型描述
+modelDescription=GLM-4.6 是智谱最新的旗舰模型，其总参数量 355B，激活参数 32B。
+# 应用场景
+appScenario=推理,对话问答,制造,金融,教育,广媒
+# 框架类型
+frameType=vllm
+# 加速卡类型
+accelerateType=K100AI
\ No newline at end of file