Add vllm inference and update README

45f05d82 · chenych · e0fd56eb · 45f05d82 · 45f05d82 · e0fd56eb
Commit 45f05d82 authored Nov 24, 2025 by chenych
5 changed files
--- a/README.md
+++ b/README.md
 # GLM-4.1V
 ## 论文
-`GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning`
- https://arxiv.org/abs/2507.01006
+[GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning](https://arxiv.org/abs/2507.01006)

-## 模型结构
-GLM-4.1V-Thinking 有三部分组成：
-1. 一个视觉 Transformer 编码器，用于处理和编码图像及视频；
-2. 一个多层感知机投影器，用于将视觉特征与标记对齐；
-3. 一个大型语言模型作为语言解码器，用于处理多模态标记并生成标记补全内容。
-
-<div align=center>
-    <img src="./doc/model.png"/>
-</div>
-
-## 算法原理
+## 模型简介
 GLM-4.1V-thinging旨在探索视觉语言模型推理能力的上限，通过引入“思考范式”并利用采样强化学习 RLCS（Reinforcement Learning with Curriculum Sampling）全面提升模型能力。在 100 亿参数的视觉语言模型中，其性能处于领先地位，在 18 项基准测试任务中与 720 亿参数的 Qwen-2.5-VL-72B 相当甚至更优。

-GLM-4.1V-Thinking能够将图像和视频以其原始的分辨率和宽高比进行识别。对于视频输入，会在每帧后面插入额外的时间索引标记，以增强模型的时间理解能力。
-
-与上一代的 CogVLM2 及 GLM-4V 系列模型相比，**GLM-4.1V-Thinking** 有如下改进：
-
+与上一代的 CogVLM2 及 GLM-4V 系列模型相比，**GLM-4.1V-Thinking**有如下改进：
 1. 系列中首个推理模型，不仅仅停留在数学领域，在多个子领域均达到世界前列的水平。
-2. 支持 **64k** 上下长度。
-3. 支持**任意长宽比**和高达 **4k** 的图像分辨率。
+2. 支持**64k**上下长度。
+3. 支持**任意长宽比**和高达**4k**的图像分辨率。
 4. 提供支持**中英文双语**的开源模型版本。

+GLM-4.1V-Thinking能够将图像和视频以其原始的分辨率和宽高比进行识别。对于视频输入，会在每帧后面插入额外的时间索引标记，以增强模型的时间理解能力。GLM-4.1V-Thinking 有三部分组成：
+1. 一个视觉 Transformer 编码器，用于处理和编码图像及视频；
+2. 一个多层感知机投影器，用于将视觉特征与标记对齐；
+3. 一个大型语言模型作为语言解码器，用于处理多模态标记并生成标记补全内容。
+
 <div align=center>
-    <img src="./doc/methods.png"/>
+    <img src="./doc/model.png"/>
 </div>

-## 环境配置
-### 硬件需求
-DCU型号：K100_AI，节点数量：1台，卡数：1张。
+## 环境依赖
+| 软件 | 版本 |
+| :------: | :------: |
+| DTK | 25.04.2 |
+| python | 3.10.12 |
+| torch | 2.5.1+das.opt1.dtk25042 |
+| transformers | 4.53.2 |
+| vllm | 0.11.0 |

-`-v 路径`、`docker_name`和`imageID`根据实际情况修改
+推荐使用镜像:
+- 挂载地址`-v`根据实际模型情况修改

-### Docker（方法一）
+```bash
+docker run -it --shm-size 60g --network=host --name glm-41v --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro image.sourcefind.cn:5000/dcu/admin/base/custom:vllm-ubuntu22.04-dtk25.04.2-py3.10-minimax-m2 bash
 ```
-docker pull image.sourcefind.cn:5000/dcu/admin/base/vllm:0.8.5-ubuntu22.04-dtk25.04.1-rc5-das1.6-py3.10-20250711
-docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
-
-cd /your_code_path/glm-4.1v_pytorch
-pip install transformers==4.53.2
-```
-### Dockerfile（方法二）
-此处提供dockerfile的使用方法
-```
-docker build --no-cache -t glm-4.1v:latest .
-docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
-
-cd /your_code_path/glm-4.1v_pytorch
-pip install transformers==4.53.2
-```
-### Anaconda（方法三）
-此处提供本地配置、编译的详细步骤，例如：
+更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。

 关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
-```
-DTK驱动：dtk25.04.1
-python：python3.10
-torch: 2.4.1+das.opt1.dtk25041
-```
-`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
-
-其它非深度学习库参照requirements.txt安装：
-```
-pip install transformers==4.53.2
-```

 ## 数据集
 [LLaMA-Factory](https://developer.sourcefind.cn/codes/OpenDAS/llama-factory)已经支持本模型的微调。以下是构建数据集的说明，这是一个使用了两张图片的数据集。你需要将数据集整理为`finetune.json`，然后根据llama-factory中的数据配置进行相关修改。
@@ -105,7 +76,6 @@ pip install transformers==4.53.2

 ## 训练
 ### Llama Factory 微调方法(推荐)
-
 根据[LLaMA-Factory](https://developer.sourcefind.cn/codes/OpenDAS/llama-factory)仓库指引安装好llama-factory后，请检查下transformers版本，如果版本不等于4.53.2，需要重新安装`pip install transformers==4.53.2`

 因为transformers版本与[LLaMA-Factory](https://developer.sourcefind.cn/codes/OpenDAS/llama-factory)版本不一致，启动训练前需要先增加环境变量来跳过版本检查，环境变量如下：
@@ -114,8 +84,8 @@ pip install transformers==4.53.2
 export DISABLE_VERSION_CHECK=1
 ```

-Tips:
-单卡微调请注释掉yaml文件中的`deepspeed`参数
+> **Tips:**
+> 单卡微调请注释掉yaml文件中的`deepspeed`参数

 #### 全参微调
 SFT训练脚本示例，参考`llama-factory/train_full`下对应yaml文件。
@@ -133,18 +103,54 @@ SFT训练脚本示例，参考`llama-factory/train_lora`下对应yaml文件。
 参数解释同[#全参微调](#全参微调)

 ## 推理
+### transformers
 - `trans_infer_transformers.py`: 使用`transformers`库进行单次对话推理。
 - `trans_infer_cli.py`: 使用`transformers`库作为推理后端的命令行交互脚本。你可以使用它进行连续对话。

-```
+```bash
 # 单次推理启动命令如下
 python inference/trans_infer_transformers.py
 # 交互对话启动命令如下：
 python inference/trans_infer_cli.py
 ```

-## result
+### vllm
+```bash
+export HIP_VISIBLE_DEVICES=0
+export ALLREDUCE_STREAM_WITH_COMPUTE=1
+export HSA_FORCE_FINE_GRAIN_PCIE=1
+## 启动服务
+vllm serve THUDM/GLM-4.1V-9B-Thinking \
+  --limit-mm-per-prompt '{"image":32}' \
+  --allowed-local-media-path / \
+  --trust-remote-code \
+  --max-model-len 32768 \
+  --served-model-name glm-4.1v-thinking
+
+## 访问
+curl http://localhost:8000/v1/chat/completions  \
+    -H "Content-Type: application/json"  \
+    -d '{
+        "model": "glm-4.1v-thinking",
+        "messages": [
+        {
+          "role": "user",
+          "content": [
+            {
+              "type": "image_url",
+              "image_url": {"url": "file:///home/glm-4.1v_pytorch/doc/Grayscale_8bits_palette_sample_image.png"}
+            },
+            {
+              "type": "text",
+              "text": "describe this image"
+            }
+          ]
+        }],
+        "temperature": 0.7
+    }'
+```

+## 效果展示
 <div align=center>
    <img src="./doc/results-dcu.png"/>
 </div>
@@ -158,16 +164,11 @@ python inference/trans_infer_cli.py
 | A800 | 375 | 0.5245 |
 | K100_AI | 375 | 0.5264 |

-## 应用场景
-### 算法类别
-`对话问答`
-
-### 热点应用行业
-`制造,广媒,家居,教育`
-
 ## 预训练权重
- [GLM-4.1V-9B-Base](https://huggingface.co/THUDM/GLM-4.1V-9B-Base)
- [GLM-4.1V-9B-Thinking](https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking)
+| 模型名称  | 权重大小  | DCU型号  | 最低卡数需求 |下载地址|
+|:-----:|:----------:|:----------:|:---------------------:|:----------:|
+| GLM-4.1V-9B-Base | 9B | K100AI,BW1000 | 1 | [下载地址](https://huggingface.co/THUDM/GLM-4.1V-9B-Base) |
+| GLM-4.1V-9B-Thinking | 9B | K100AI,BW1000 | 1 | [下载地址](https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking) |

 ## 源码仓库及问题反馈
 - https://developer.sourcefind.cn/codes/modelzoo/glm-4.1v_pytorch

--- a/doc/results.png
+++ b/doc/results.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
-FROM image.sourcefind.cn:5000/dcu/admin/base/vllm:0.8.5-ubuntu22.04-dtk25.04.1-rc5-das1.6-py3.10-20250711
\ No newline at end of file
--- a/inference/trans_infer_cli.py
+++ b/inference/trans_infer_cli.py
@@ -26,7 +26,6 @@ import re
 import torch
 from transformers import AutoProcessor, Glm4vForConditionalGeneration

-
 def build_content(image_paths, video_path, text):
    content = []
    if image_paths:

--- a/model.properties
+++ b/model.properties
@@ -4,7 +4,11 @@ modelCode=1668
 modelName=GLM-4.1V_pytorch
 # 模型描述
 modelDescription=GLM-4.1V-9B-Thinking 通过引入「思维链」（Chain-of-Thought）推理机制，在回答准确性、内容丰富度与可解释性方面，全面超越传统的非推理式视觉模型。
-# 应用场景
-appScenario=推理,训练,对话问答,制造,广媒,家居,教育
+# 运行过程
+processType=推理,训练
+# 算法类别
+appCategory=对话问答
 # 框架类型
 frameType=pytorch
+# 加速卡类型
+accelerateType=BW1000,K100AI
\ No newline at end of file