init chatglm

42b5feac · zhuwenwen · 42b5feac · 42b5feac · 42b5feac · 42b5feac
Commit 42b5feac authored Jun 13, 2024 by zhuwenwen
8 changed files
--- a/.gitmodules
+++ b/.gitmodules
+[submodule "vllm"]
+	path = vllm
+	url = http://developer.hpccube.com/codes/OpenDAS/vllm.git
+	branch = vllm-v0.3.3-dtk24.04
--- a/Dockerfile
+++ b/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.3.3-dtk24.04-centos7.6-py310-v1
+ENV LANG C.UTF-8
--- a/LICENSE
+++ b/LICENSE
+The glm-4-9b License
+1. 定义
+“许可方”是指分发其软件的 glm-4-9b 模型团队。
+“软件”是指根据本许可提供的 glm-4-9b 模型参数。
+2. 许可授予
+根据本许可的条款和条件，许可方特此授予您非排他性、全球性、不可转让、不可再许可、可撤销、免版税的版权许可。
+本许可允许您免费使用本仓库中的所有开源模型进行学术研究，对于希望将模型用于商业目的的用户，需在[这里](https://open.bigmodel.cn/mla/form)完成登记。经过登记的用户可以免费使用本模型进行商业活动，但必须遵守本许可的所有条款和条件。
+上述版权声明和本许可声明应包含在本软件的所有副本或重要部分中。
+如果您分发或提供 THUDM / 智谱AI 关于 glm-4 开源模型的材料（或其任何衍生作品），或使用其中任何材料（包括 glm-4 系列的所有开源模型）的产品或服务，您应:
+(A) 随任何此类 THUDM / 智谱AI 材料提供本协议的副本；
+(B) 在相关网站、用户界面、博客文章、关于页面或产品文档上突出显示 “Built with glm-4”。
+如果您使用 THUDM / 智谱AI的 glm-4 开源模型的材料来创建、训练、微调或以其他方式改进已分发或可用的 AI 模型，您还应在任何此类 AI 模型名称的开头添加 “glm-4”。
+3. 限制
+您不得出于任何军事或非法目的使用、复制、修改、合并、发布、分发、复制或创建本软件的全部或部分衍生作品。
+您不得利用本软件从事任何危害国家安全和国家统一，危害社会公共利益及公序良俗，侵犯他人商业秘密、知识产权、名誉权、肖像权、财产权等权益的行为。
+您在使用中应遵循使用地所适用的法律法规政策、道德规范等要求。
+4. 免责声明
+本软件“按原样”提供，不提供任何明示或暗示的保证，包括但不限于对适销性、特定用途的适用性和非侵权性的保证。
+在任何情况下，作者或版权持有人均不对任何索赔、损害或其他责任负责，无论是在合同诉讼、侵权行为还是其他方面，由软件或软件的使用或其他交易引起、由软件引起或与之相关
+软件。
+5. 责任限制
+除适用法律禁止的范围外，在任何情况下且根据任何法律理论，无论是基于侵权行为、疏忽、合同、责任或其他原因，任何许可方均不对您承担任何直接、间接、特殊、偶然、示范性、
+或间接损害，或任何其他商业损失，即使许可人已被告知此类损害的可能性。
+6. 争议解决
+本许可受中华人民共和国法律管辖并按其解释。 因本许可引起的或与本许可有关的任何争议应提交北京市海淀区人民法院。
+请注意，许可证可能会更新到更全面的版本。 有关许可和版权的任何问题，请通过 license@zhipuai.cn 与我们联系。
+1. Definitions
+“Licensor” means the glm-4-9b Model Team that distributes its Software.
+“Software” means the glm-4-9b model parameters made available under this license.
+2. License
+Under the terms and conditions of this license, the Licensor hereby grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license.
+This license allows you to use all open source models in this repository for free for academic research. For users who wish to use the models for commercial purposes, please do so [here](https://open.bigmodel.cn/mla/form)
+Complete registration. Registered users are free to use this model for commercial activities, but must comply with all terms and conditions of this license.
+The copyright notice and this license notice shall be included in all copies or substantial portions of the Software.
+If you distribute or provide THUDM / Zhipu AI materials on the glm-4 open source model (or any derivative works thereof), or products or services that use any materials therein (including all open source models of the glm-4 series), you should:
+(A) Provide a copy of this Agreement with any such THUDM/Zhipu AI Materials;
+(B) Prominently display "Built with glm-4" on the relevant website, user interface, blog post, related page or product documentation.
+If you use materials from THUDM/Zhipu AI's glm-4 model to create, train, operate, or otherwise improve assigned or available AI models, you should also add "glm-4" to the beginning of any such AI model name.
+3. Restrictions
+You are not allowed to use, copy, modify, merge, publish, distribute, copy or create all or part of the derivative works of this software for any military or illegal purposes.
+You are not allowed to use this software to engage in any behavior that endangers national security and unity, endangers social public interests and public order, infringes on the rights and interests of others such as trade secrets, intellectual property rights, reputation rights, portrait rights, and property rights.
+You should comply with the applicable laws, regulations, policies, ethical standards, and other requirements in the place of use during use.
+4. Disclaimer
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
+WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+5. Limitation of Liability
+EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER BASED IN TORT,
+NEGLIGENCE, CONTRACT, LIABILITY, OR OTHERWISE WILL ANY LICENSOR BE LIABLE TO YOU FOR ANY DIRECT, INDIRECT, SPECIAL,
+INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES, OR ANY OTHER COMMERCIAL LOSSES, EVEN IF THE LICENSOR HAS BEEN ADVISED
+OF THE POSSIBILITY OF SUCH DAMAGES.
+6. Dispute Resolution
+This license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute
+arising from or in connection with this License shall be submitted to Haidian District People's Court in Beijing.
+Note that the license is subject to update to a more comprehensive version. For any questions related to the license and
+copyright, please contact us at license@zhipuai.cn.
\ No newline at end of file
--- a/README.md
+++ b/README.md
+<!--
+ * @Author: zhuww
+ * @email: zhuww@sugon.com
+ * @Date: 2024-06-13 14:38:07
+ * @LastEditTime: 2024-06-13 16:16:01
+-->
+## 论文
+`GLM: General Language Model Pretraining with Autoregressive Blank Infilling`
+- [https://arxiv.org/abs/2103.10360](https://arxiv.org/abs/2103.10360)
+## 模型结构
+ChatGLM-6B 是清华大学开源的开源的、支持中英双语的对话语言模型，基于 [General Language Model (GLM)](https://github.com/THUDM/GLM) 架构，具有 62 亿参数。ChatGLM-6B 使用了和 ChatGPT 相似的技术，针对中文问答和对话进行了优化。经过约 1T 标识符的中英双语训练，辅以监督微调、反馈自助、人类反馈强化学习等技术的加持，62 亿参数的 ChatGLM-6B 已经能生成相当符合人类偏好的回答。ChatGLM2-6B 是开源中英双语对话模型 ChatGLM-6B 的第二代版本，ChatGLM3 是智谱AI和清华大学 KEG 实验室联合发布的新一代对话预训练模型。ChatGLM3-6B 是 ChatGLM3 系列中的开源模型，在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上，ChatGLM3-6B 具有更强大的基础模型、更完整的功能支持、更全面的开源序列。
+<div align="center">
+<img src="docs/transformers.jpg" width="300" height="400">
+</div>
+以下是ChatGLM系列模型的主要网络参数配置：
+| 模型名称    | 隐含层维度 | 层数 | 头数 | 词表大小 | 位置编码 | 最大序列长度 |
+| ----------- | ---------- | ---- | ---- | -------- | -------- | ------------ |
+| ChatGLM-6B | 4096       | 28   | 32   | 130528    | RoPE     | 2048         |
+| ChatGLM2-6B | 4096       | 28   | 32   | 65024    | RoPE     | 8192         |
+| ChatGLM3-6B | 4096       | 28   | 32   | 65024    | RoPE     | 8192         |
+| glm-4-9b | 4096       | 40   | 32   | 151552    | RoPE     | 131072         |
+## 算法原理
+ChatGLM系列模型基于GLM架构开发。GLM是一种基于Transformer的语言模型，以自回归空白填充为训练目标， 同时具备自回归和自编码能力。
+<div align="center">
+<img src="docs/GLM.png" width="550" height="200">
+</div>
+## 环境配置
+### Docker（方法一）
+提供[光源](https://www.sourcefind.cn/#/image/dcu/custom)拉取推理的docker镜像：
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.3.3-dtk24.04-centos7.6-py310-v1
+# <Image ID>用上面拉取docker镜像的ID替换
+# <Host Path>主机端路径
+# <Container Path>容器映射路径
+docker run -it --name chatglm_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal -v <Host Path>:<Container Path> <Image ID> /bin/bash
+```
+### Dockerfile（方法二）
+```
+# <Host Path>主机端路径
+# <Container Path>容器映射路径
+docker build -t chatglm:latest .
+docker run -it --name chatglm_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal -v <Host Path>:<Container Path> llama:latest /bin/bash
+```
+### Anaconda（方法三）
+```
+conda create -n chatglm_vllm python=3.10
+pip install aiohttp==3.9.1 outlines==0.0.37 openai==1.23.3
+```
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+* DTK驱动：dtk24.04
+* Pytorch: 2.1.0
+* triton:2.1.0
+* vllm: 0.3.3
+* xformers: 0.0.25
+* flash_attn: 2.0.4
+* python: python3.10
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应.目前只能在K100_AI上使用`
+## 数据集
+无
+## 推理
+### 源码编译安装
+```
+# 若使用光源的镜像，可以跳过源码编译安装，镜像中已安装vllm。
+git clone http://developer.hpccube.com/codes/modelzoo/chatglm_vllm.git
+cd llama_vllm
+git submodule init && git submodule update
+cd vllm
+pip install wheel
+python setup.py bdist_wheel
+cd dist && pip install vllm*
+```
+### 模型下载
+| chat模型                                                                        | 长文本模型                                                                                | 
+| ------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | 
+| [chatglm-6b](https://huggingface.co/THUDM/chatglm-6b) |  
+| [chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) | [chatglm2-6b-32k](https://huggingface.co/THUDM/chatglm2-6b-32k) | 
+| [chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b)  | [chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k) | 
+| [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) | 
+### 离线批量推理
+```bash
+python vllm/examples/offline_inference.py
+```
+其中，`prompts`为提示词；`temperature`为控制采样随机性的值，值越小模型生成越确定，值变高模型生成更随机，0表示贪婪采样，默认为1；`max_tokens=16`为生成长度，默认为1；
+`model`为模型路径；`tensor_parallel_size=1`为使用卡数，默认为1；`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理,`quantization="gptq"`为使用gptq量化进行推理,需下载以上GPTQ模型。
+### 离线批量推理性能测试
+1、指定输入输出
+```bash
+python vllm/benchmarks/benchmark_throughput.py --num-prompts 1 --input-len 32 --output-len 128 --model THUDM/glm-4-9b-chat -tp 1 --trust-remote-code --enforce-eager --dtype float16
+```
+其中`--num-prompts`是batch数，`--input-len`是输入seqlen，`--output-len`是输出token长度，`--model`为模型路径，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。若指定`--output-len  1`即为首字延迟。`-q gptq`为使用gptq量化模型进行推理。
+2、使用数据集
+下载数据集：
+```bash
+wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
+```
+```bash
+python vllm/benchmarks/benchmark_throughput.py --num-prompts 1 --model THUDM/glm-4-9b-chat --dataset ShareGPT_V3_unfiltered_cleaned_split.json -tp 1 --trust-remote-code --enforce-eager --dtype float16
+```
+其中`--num-prompts`是batch数，`--model`为模型路径，`--dataset`为使用的数据集，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。`-q gptq`为使用gptq量化模型进行推理。
+### api服务推理性能测试
+1、启动服务端：
+```bash
+python -m vllm.entrypoints.api_server  --model THUDM/glm-4-9b-chat  --dtype float16 --enforce-eager -tp 1 
+```
+2、启动客户端：
+```bash
+python vllm/benchmarks/benchmark_serving.py --model THUDM/glm-4-9b-chat --dataset ShareGPT_V3_unfiltered_cleaned_split.json  --num-prompts 1 --trust-remote-code
+```
+参数同使用数据集，离线批量推理性能测试，具体参考[vllm/benchmarks/benchmark_serving.py]
+### OpenAI兼容服务
+启动服务：
+```bash
+python -m vllm.entrypoints.openai.api_server --model THUDM/glm-4-9b-chat --enforce-eager --dtype float16 --trust-remote-code
+```
+这里`--model`为加载模型路径，`--dtype`为数据类型：float16，默认情况使用tokenizer中的预定义聊天模板，`--chat-template`可以添加新模板覆盖默认模板,`-q gptq`为使用gptq量化模型进行推理。
+列出模型型号：
+```bash
+curl http://localhost:8000/v1/models
+```
+### OpenAI Completions API和vllm结合使用
+```bash
+curl http://localhost:8000/v1/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "THUDM/glm-4-9b-chat",
+        "prompt": "晚上睡不着怎么办",
+        "max_tokens": 7,
+        "temperature": 0
+    }'
+```
+或者使用[vllm/examples/openai_completion_client.py](https://developer.hpccube.com/codes/OpenDAS/vllm/-/blob/675c0abe47eb9d29c126fbecda86fd5801162eba/examples/openai_completion_client.py)
+### OpenAI Chat API和vllm结合使用
+```bash
+curl http://localhost:8000/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "THUDM/glm-4-9b-chat",
+        "messages": [
+            {"role": "system", "content": "晚上睡不着怎么办"},
+            {"role": "user", "content": "晚上睡不着怎么办"}
+        ]
+    }'
+```
+或者使用[vllm/examples/openai_chatcompletion_client.py](https://developer.hpccube.com/codes/OpenDAS/vllm/-/blob/675c0abe47eb9d29c126fbecda86fd5801162eba/examples/openai_chatcompletion_client.py)
+## result
+使用的加速卡:1张 DCU-K100_AI-64G
+```
+Prompt: '晚上睡不着怎么办', Generated text: '？\n晚上睡不着可以尝试以下方法来改善睡眠质量：\n\n1. **调整作息时间**：尽量每天同一时间上床睡觉和起床，建立规律的生物钟。\n\n2. **放松身心**：睡前进行深呼吸、冥想或瑜伽等放松活动，有助于减轻压力和焦虑。\n\n3. **避免咖啡因和酒精**：晚上避免摄入咖啡因和酒精，因为它们可能会干扰睡眠。\n\n'
+```
+### 精度
+无
+## 应用场景
+### 算法类别
+对话问答
+### 热点应用行业
+医疗,金融,科研,教育
+## 源码仓库及问题反馈
+* [https://developer.hpccube.com/codes/modelzoo/llama_vllm](https://developer.hpccube.com/codes/modelzoo/chatglm_vllm)
+## 参考资料
+* [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)
+* [https://github.com/THUDM/ChatGLM3](https://github.com/THUDM/ChatGLM3)
\ No newline at end of file
--- a/docs/GLM.png
+++ b/docs/GLM.png
--- a/docs/transformers.jpg
+++ b/docs/transformers.jpg
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode = 699
+# 模型名称
+modelName=chatglm_vllm
+# 模型描述
+modelDescription=ChatGLM3是智谱AI与清华大学KEG实验室联合发布的新一代对话预训练模型
+# 应用场景
+appScenario=推理,对话问答,医疗,科研,金融,教育
+# 框架类型
+frameType=vllm
--- a/vllm @ df6349c7
+++ b/vllm @ df6349c7
+Subproject commit df6349c78b49a5b8f6f600d0d9490791cd1d32ee