Support GLM-4-0414

67ca83cf · Rayyyyy · 78ba9d16 · 67ca83cf · 78ba9d16 · 78ba9d16
Commit 67ca83cf authored Apr 17, 2025 by Rayyyyy
20 changed files
--- a/README.md
+++ b/README.md
@@ -15,47 +15,45 @@ GLM-4-9B是智谱AI推出的最新一代预训练模型GLM-4系列中的开源
 </div>
 ## 环境配置
-v 路径、docker_name和imageID根据实际情况修改
+`-v 路径`、`docker_name`和`imageID`根据实际情况修改
 ### Docker（方法一）
 ```bash
-docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
+dcoker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10
-docker run -it --network=host --privileged=true --name=docker_name --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=32G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro imageID /bin/bash
+docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
-cd /your_code_path/glm4-9b_pytorch
+cd /your_code_path/glm-4_pytorch
-cd basic_demo
+pip install -r inference/requirements.txt
-pip install -r requirements.txt
+pip install -r finetune/requirements.txt
 ```
 ### Dockerfile（方法二）
 ```bash
 cd ./docker
 docker build --no-cache -t glm4-9b:latest .
+docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
-docker run -it --network=host --privileged=true --name=docker_name --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=32G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro imageID /bin/bash
+cd /your_code_path/glm-4_pytorch
+pip install -r inference/requirements.txt
-cd /your_code_path/glm4-9b_pytorch
+pip install -r finetune/requirements.txt
-cd basic_demo
-pip install -r requirements.txt
 ```
 ### Anaconda（方法三）
 1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
 ```bash
-DTK软件栈：dtk24.04
+DTK: 25.04
-python：python3.10
+python: 3.10
-torch：2.1
+torch: 2.4.1
-deepspeed: 0.12.3
+deepspeed: 0.14.2+das.opt2.dtk2504
 ```
 **Tips**：以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应
 2、其他非特殊库直接按照下面步骤进行安装
 ```bash
-cd basic_demo
+pip install -r inference/requirements.txt
-pip install deepspeed-0.12.3+das1.0+gita724046.abi0.dtk2404.torch2.1.0-cp310-cp310-manylinux2014_x86_64.whl
+pip install -r finetune/requirements.txt
-pip install -r requirements.txt
 ```
 ## 数据集
@@ -80,33 +78,93 @@ python gen_messages_data.py --data_path /path/to/AdvertiseGen
 - 这里是一个不带有工具的例子:
-```
+```json
-{"messages": [{"role": "user", "content": "类型#裤*材质#牛仔布*风格#性感"}, {"role": "assistant", "content": "3x1的这款牛仔裤采用浅白的牛仔面料为裤身材质，其柔然的手感和细腻的质地，在穿着舒适的同时，透露着清纯甜美的个性气质。除此之外，流畅的裤身剪裁将性感的腿部曲线彰显的淋漓尽致，不失为一款随性出街的必备单品。"}]}
+{
+  "messages": [
+    {
+      "role": "user",
+      "content": "类型#裤*材质#牛仔布*风格#性感"
+    },
+    {
+      "role": "assistant",
+      "content": "3x1的这款牛仔裤采用浅白的牛仔面料为裤身材质，其柔然的手感和细腻的质地，在穿着舒适的同时，透露着清纯甜美的个性气质。除此之外，流畅的裤身剪裁将性感的腿部曲线彰显的淋漓尽致，不失为一款随性出街的必备单品。"
+    }
+  ]
+}
 ```
 - 这是一个带有工具调用的例子:
-```
+```json
-{"messages": [{"role": "system", "content": "", "tools": [{"type": "function", "function": {"name": "get_recommended_books", "description": "Get recommended books based on user's interests", "parameters": {"type": "object", "properties": {"interests": {"type": "array", "items": {"type": "string"}, "description": "The interests to recommend books for"}}, "required": ["interests"]}}}]}, {"role": "user", "content": "Hi, I am looking for some book recommendations. I am interested in history and science fiction."}, {"role": "assistant", "content": "{\"name\": \"get_recommended_books\", \"arguments\": {\"interests\": [\"history\", \"science fiction\"]}}"}, {"role": "observation", "content": "{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"}, {"role": "assistant", "content": "Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."}]}
+{
+  "messages": [
+    {
+      "role": "system",
+      "content": "",
+      "tools": [
+        {
+          "type": "function",
+          "function": {
+            "name": "get_recommended_books",
+            "description": "Get recommended books based on user's interests",
+            "parameters": {
+              "type": "object",
+              "properties": {
+                "interests": {
+                  "type": "array",
+                  "items": {
+                    "type": "string"
+                  },
+                  "description": "The interests to recommend books for"
+                }
+              },
+              "required": [
+                "interests"
+              ]
+            }
+          }
+        }
+      ]
+    },
+    {
+      "role": "user",
+      "content": "Hi, I am looking for some book recommendations. I am interested in history and science fiction."
+    },
+    {
+      "role": "assistant",
+      "content": "{\"name\": \"get_recommended_books\", \"arguments\": {\"interests\": [\"history\", \"science fiction\"]}}"
+    },
+    {
+      "role": "observation",
+      "content": "{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"
+    },
+    {
+      "role": "assistant",
+      "content": "Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."
+    }
+  ]
+}
 ```
- `system` 角色为可选角色，但若存在 `system` 角色，其必须出现在 `user` 角色之前，且一个完整的对话数据（无论单轮或者多轮对话）只能出现一次 `system` 角色。
+- `system` 角色为可选角色，但若存在 `system` 角色，其必须出现在 `user`
- `tools` 字段为可选字段，若存在 `tools` 字段，其必须出现在 `system` 角色之后，且一个完整的对话数据（无论单轮或者多轮对话）只能出现一次 `tools` 字段。当 `tools` 字段存在时，`system` 角色必须存在并且 `content` 字段为空。
+  角色之前，且一个完整的对话数据（无论单轮或者多轮对话）只能出现一次 `system` 角色。
+- `tools` 字段为可选字段，若存在 `tools` 字段，其必须出现在 `system`
+  角色之后，且一个完整的对话数据（无论单轮或者多轮对话）只能出现一次 `tools` 字段。当 `tools` 字段存在时，`system`
+  角色必须存在并且 `content` 字段为空。
 ## 训练
 通过[预训练权重](#预训练权重)下载预训练模型，当前用例使用[GLM-4-9B-chat](https://huggingface.co/THUDM/glm-4-9b-chat)、[GLM-4-9B-0414](https://huggingface.co/THUDM/GLM-4-9B-0414)模型。
-### GLM-4-9B-chat 原生训练方法
+### 原生训练方法
-1. 进入`finetune_demo`目录下，首先安装所需环境信息：
+1. 进入`finetune`目录下：
 ```bash
-cd finetune_demo
+cd finetune
-pip install -r requirements.txt
 ```
-2. 配置文件位于[configs](./finetune_demo/configs/)目录下，包括以下文件：
+2. 配置文件位于[configs](./finetune/configs/)目录下，包括以下文件：
- `deepspeed配置文件`：[ds_zereo_2](./finetune_demo/configs/ds_zereo_2.json)，[ds_zereo_3](./finetune_demo/configs/ds_zereo_3.json)
+- `deepspeed配置文件`：[ds_zereo_2](./finetune/configs/ds_zereo_2.json)，[ds_zereo_3](./finetune/configs/ds_zereo_3.json)
- `lora.yaml/ ptuning_v2.yaml / sft.yaml`: 模型不同方式的配置文件，包括模型参数、优化器参数、训练参数等。部分重要参数解释如下：
+- `lora.yaml/ sft.yaml`: 模型不同方式的配置文件，包括模型参数、优化器参数、训练参数等。部分重要参数解释如下：
    + data_config 部分
        + train_file: 训练数据集的文件路径。
        + val_file: 验证数据集的文件路径。
@@ -143,20 +201,17 @@ pip install -r requirements.txt
        + num_attention_heads: 2: P-TuningV2 的注意力头数(不要改动)。
        + token_dim: 256: P-TuningV2 的 token 维度(不要改动)。
-3. 脚本中主要参数解释, 以下参数均可根据自身数据地址进行替换:
-    + `data/AdvertiseGen/saves/`: `.jsonl`数据地址
-    + `../checkpoints/glm-4-9b-chat/`: 模型地址
-    + `configs/lora.yaml`: 配置文件地址
 #### 单机单卡
 ```shell
-bash train.sh
+# For Chat Fine-tune
+python finetune.py  data/AdvertiseGen/  THUDM/GLM-4-9B-0414  configs/lora.yaml
 ```
 #### 单机多卡/多机多卡
 这里使用`deepspeed`作为加速方案，请确认当前环境已经根据[环境配置章节](#环境配置)安装好了`deepspeed`库。
 ```shell
-bash train_dp.sh
+# For Chat Fine-tune
+OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=8  finetune.py  data/AdvertiseGen/  THUDM/GLM-4-9B-0414  configs/lora.yaml # For Chat Fine-tune
 ```
 #### 从保存点进行微调
@@ -164,12 +219,12 @@ bash train_dp.sh
 1. `yes`, 自动从**最后一个保存的Checkpoint**开始训练，例如:
 ```shell
-python finetune.py ../data/AdvertiseGen/saves/ ../checkpoints/glm-4-9b-chat/ configs/lora.yaml yes
+python finetune.py ../data/AdvertiseGen/saves/ THUDM/GLM-4-9B-0414 configs/lora.yaml yes
 ```
 2. `XX`, 断点号数字，例`600`则从序号**600 Checkpoint**开始训练，例如：
 ```shell
-python finetune.py ../data/AdvertiseGen/saves/ ../checkpoints/glm-4-9b-chat/ configs/lora.yaml 600
+python finetune.py ../data/AdvertiseGen/saves/ THUDM/GLM-4-9B-0414 configs/lora.yaml 600
 ```
 ### Llama Factory 微调方法(推荐)
@@ -194,41 +249,26 @@ SFT训练脚本示例，参考`llama-factory/train_lora`下对应yaml文件。
 参数解释同[#全参微调](#全参微调)
 ## 推理
-### GLM-4-9B-Chat/GLM-4V-9B 模型推理脚本
+```shell
-**参数解释**
+cd inference
- `--model_name_or_path`:待测模型名或模型地址，当前默认"THUDM/glm-4-9b-chat"
- `--device`: 当前默认"cuda"
- `--query`: 待测输入语句，当前默认"你好"
-```
-pip install -U huggingface_hub hf_transfer
-export HF_ENDPOINT=https://hf-mirror.com/
-cd basic_demo
-python quick_start.py
 ```
+### 使用 transformers 后端代码
 #### 使用命令行与 GLM-4-9B 模型进行对话
-```
+```shell
-# chat
+# 修改代码中的MODEL_PATH为测试模型地址
-python trans_cli_demo.py --model_name_or_path ../checkpoints/glm-4-9b-chat
+# 当前默认GLM-4-9B-0414模型
-# 多模态
+python trans_cli_demo.py
-python trans_cli_vision_demo.py --model_name_or_path ../checkpoints/glm-4v-9b
 ```
 #### 使用 Gradio 网页端与 GLM-4-9B-Chat 模型进行对话
-```
+```shell
-python trans_web_demo.py --model_name_or_path ../checkpoints/glm-4-9b-chat
+# 修改代码中的MODEL_PATH为测试模型地址
-```
+# 当前默认GLM-4-9B-0414模型
+python trans_web_demo.py
-#### 验证微调后的模型
-您可以在[finetune_demo/inference.py](./finetune_demo/inference.py) 中使用微调后的模型，执行方式如下。
-```
-python inference.py your_finetune_path
 ```
 ### GLM-4-9B-0414/GLM-4-32B-0414/GLM-4-32B-Base-0414 模型推理脚本
-```
+```shell
 python infer_glm4.py --model_path /path/of/model/ --message "你好"
 ```

--- a/basic_demo/quick_start.py
+++ b/basic_demo/quick_start.py
-'''based on transformers'''
-import torch
-import argparse
-from transformers import AutoModelForCausalLM, AutoTokenizer
-parse = argparse.ArgumentParser()
-parse.add_argument('--model_name_or_path', default="THUDM/glm-4-9b-chat")
-parse.add_argument('--device', default="cuda")
-parse.add_argument('--query', type=str, default="你好")
-args = parse.parse_args()
-device = args.device
-model_name_or_path = args.model_name_or_path
-tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
-query = args.query
-inputs = tokenizer.apply_chat_template([{"role": "user", "content": query}],
-                                       add_generation_prompt=True,
-                                       tokenize=True,
-                                       return_tensors="pt",
-                                       return_dict=True
-                                       )
-inputs = inputs.to(device)
-model = AutoModelForCausalLM.from_pretrained(
-    model_name_or_path,
-    torch_dtype=torch.bfloat16,
-    low_cpu_mem_usage=True,
-    trust_remote_code=True
-).to(device).eval()
-gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
-with torch.no_grad():
-    outputs = model.generate(**inputs, **gen_kwargs)
-    outputs = outputs[:, inputs['input_ids'].shape[1]:]
-    print('Result', tokenizer.decode(outputs[0], skip_special_tokens=True))
--- a/basic_demo/requirements.txt
+++ b/basic_demo/requirements.txt
-transformers==4.40.0
-huggingface-hub>=0.23.1
-sentencepiece>=0.2.0
-pydantic>=2.7.1
-timm>=0.9.16
-tiktoken>=0.7.0
-accelerate>=0.30.1
-sentence_transformers>=2.7.0
-# web demo
-gradio>=4.31.5
--- a/basic_demo/trans_web_demo.py
+++ b/basic_demo/trans_web_demo.py
-"""
-This script creates an interactive web demo for the GLM-4-9B model using Gradio,
-a Python library for building quick and easy UI components for machine learning models.
-It's designed to showcase the capabilities of the GLM-4-9B model in a user-friendly interface,
-allowing users to interact with the model through a chat-like interface.
-"""
-import os
-import argparse
-import torch
-import gradio as gr
-from threading import Thread
-from typing import Union
-from pathlib import Path
-from peft import AutoPeftModelForCausalLM, PeftModelForCausalLM
-from transformers import (
-    AutoModelForCausalLM,
-    AutoTokenizer,
-    PreTrainedModel,
-    PreTrainedTokenizer,
-    PreTrainedTokenizerFast,
-    StoppingCriteria,
-    StoppingCriteriaList,
-    TextIteratorStreamer
-)
-ModelType = Union[PreTrainedModel, PeftModelForCausalLM]
-TokenizerType = Union[PreTrainedTokenizer, PreTrainedTokenizerFast]
-# add model path
-parser = argparse.ArgumentParser()
-parser.add_argument('--model_name_or_path', default='THUDM/glm-4-9b-chat')
-args = parser.parse_args()
-# MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/glm-4-9b-chat')
-MODEL_PATH = args.model_name_or_path
-TOKENIZER_PATH = os.environ.get("TOKENIZER_PATH", MODEL_PATH)
-def _resolve_path(path: Union[str, Path]) -> Path:
-    return Path(path).expanduser().resolve()
-def load_model_and_tokenizer(
-        model_dir: Union[str, Path], trust_remote_code: bool = True
-) -> tuple[ModelType, TokenizerType]:
-    model_dir = _resolve_path(model_dir)
-    if (model_dir / 'adapter_config.json').exists():
-        model = AutoPeftModelForCausalLM.from_pretrained(
-            model_dir, trust_remote_code=trust_remote_code, device_map='auto'
-        )
-        tokenizer_dir = model.peft_config['default'].base_model_name_or_path
-    else:
-        model = AutoModelForCausalLM.from_pretrained(
-            model_dir, trust_remote_code=trust_remote_code, device_map='auto'
-        )
-        tokenizer_dir = model_dir
-    tokenizer = AutoTokenizer.from_pretrained(
-        tokenizer_dir, trust_remote_code=trust_remote_code, use_fast=False
-    )
-    return model, tokenizer
-model, tokenizer = load_model_and_tokenizer(MODEL_PATH, trust_remote_code=True)
-class StopOnTokens(StoppingCriteria):
-    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
-        stop_ids = model.config.eos_token_id
-        for stop_id in stop_ids:
-            if input_ids[0][-1] == stop_id:
-                return True
-        return False
-def parse_text(text):
-    lines = text.split("\n")
-    lines = [line for line in lines if line != ""]
-    count = 0
-    for i, line in enumerate(lines):
-        if "```" in line:
-            count += 1
-            items = line.split('`')
-            if count % 2 == 1:
-                lines[i] = f'<pre><code class="language-{items[-1]}">'
-            else:
-                lines[i] = f'<br></code></pre>'
-        else:
-            if i > 0:
-                if count % 2 == 1:
-                    line = line.replace("`", "\`")
-                    line = line.replace("<", "&lt;")
-                    line = line.replace(">", "&gt;")
-                    line = line.replace(" ", "&nbsp;")
-                    line = line.replace("*", "&ast;")
-                    line = line.replace("_", "&lowbar;")
-                    line = line.replace("-", "&#45;")
-                    line = line.replace(".", "&#46;")
-                    line = line.replace("!", "&#33;")
-                    line = line.replace("(", "&#40;")
-                    line = line.replace(")", "&#41;")
-                    line = line.replace("$", "&#36;")
-                lines[i] = "<br>" + line
-    text = "".join(lines)
-    return text
-def predict(history, max_length, top_p, temperature):
-    stop = StopOnTokens()
-    messages = []
-    for idx, (user_msg, model_msg) in enumerate(history):
-        if idx == len(history) - 1 and not model_msg:
-            messages.append({"role": "user", "content": user_msg})
-            break
-        if user_msg:
-            messages.append({"role": "user", "content": user_msg})
-        if model_msg:
-            messages.append({"role": "assistant", "content": model_msg})
-    model_inputs = tokenizer.apply_chat_template(messages,
-                                                 add_generation_prompt=True,
-                                                 tokenize=True,
-                                                 return_tensors="pt").to(next(model.parameters()).device)
-    streamer = TextIteratorStreamer(tokenizer, timeout=60, skip_prompt=True, skip_special_tokens=True)
-    generate_kwargs = {
-        "input_ids": model_inputs,
-        "streamer": streamer,
-        "max_new_tokens": max_length,
-        "do_sample": True,
-        "top_p": top_p,
-        "temperature": temperature,
-        "stopping_criteria": StoppingCriteriaList([stop]),
-        "repetition_penalty": 1.2,
-        "eos_token_id": model.config.eos_token_id,
-    }
-    t = Thread(target=model.generate, kwargs=generate_kwargs)
-    t.start()
-    for new_token in streamer:
-        if new_token:
-            history[-1][1] += new_token
-        yield history
-with gr.Blocks() as demo:
-    gr.HTML("""<h1 align="center">GLM-4-9B Gradio Simple Chat Demo</h1>""")
-    chatbot = gr.Chatbot()
-    with gr.Row():
-        with gr.Column(scale=4):
-            with gr.Column(scale=12):
-                user_input = gr.Textbox(show_label=False, placeholder="Input...", lines=10, container=False)
-            with gr.Column(min_width=32, scale=1):
-                submitBtn = gr.Button("Submit")
-        with gr.Column(scale=1):
-            emptyBtn = gr.Button("Clear History")
-            max_length = gr.Slider(0, 32768, value=8192, step=1.0, label="Maximum length", interactive=True)
-            top_p = gr.Slider(0, 1, value=0.8, step=0.01, label="Top P", interactive=True)
-            temperature = gr.Slider(0.01, 1, value=0.6, step=0.01, label="Temperature", interactive=True)
-    def user(query, history):
-        return "", history + [[parse_text(query), ""]]
-    submitBtn.click(user, [user_input, chatbot], [user_input, chatbot], queue=False).then(
-        predict, [chatbot, max_length, top_p, temperature], chatbot
-    )
-    emptyBtn.click(lambda: None, None, chatbot, queue=False)
-demo.queue()
-demo.launch(server_name="127.0.0.1", server_port=8000, inbrowser=True, share=True)
--- a/composite_demo/requirements.txt
+++ b/composite_demo/requirements.txt
-accelerate
-huggingface_hub>=0.19.4
-ipykernel>=6.26.0
-ipython>=8.18.1
-jupyter_client>=8.6.0
-langchain
-langchain-community
-matplotlib
-pillow>=10.1.0
-pymupdf
-python-docx
-python-pptx
-pyyaml>=6.0.1
-requests>=2.31.0
-sentencepiece
-streamlit>=1.35.0
-tiktoken
-transformers==4.40.0
-zhipuai>=2.1.0
-# Please install vllm if you'd like to use long context model.
-# vllm
--- a/composite_demo/src/tools/config.py
+++ b/composite_demo/src/tools/config.py
-BROWSER_SERVER_URL = 'http://localhost:3000'
-IPYKERNEL = 'glm-4-demo'
-ZHIPU_AI_KEY = ''
-COGVIEW_MODEL = 'cogview-3'
--- a/composite_demo/.gitignore
+++ b/composite_demo/.gitignore
--- a/composite_demo/README.md
+++ b/composite_demo/README.md
@@ -38,31 +38,39 @@ pnpm install
 1. 修改 `browser/src/config.ts` 中的 `BING_SEARCH_API_KEY` 配置浏览器服务需要使用的 Bing 搜索 API Key：
    ```diff
-    --- a/browser/src/config.ts
+    export default {
-    +++ b/browser/src/config.ts
-    @@ -3,7 +3,7 @@ export default {
        BROWSER_TIMEOUT: 10000,
        BING_SEARCH_API_URL: 'https://api.bing.microsoft.com/v7.0',
-    -    BING_SEARCH_API_KEY: '',
+        BING_SEARCH_API_KEY: '<PUT_YOUR_BING_SEARCH_KEY_HERE>',
-    +    BING_SEARCH_API_KEY: '<PUT_YOUR_BING_SEARCH_KEY_HERE>',
        HOST: 'localhost',
        PORT: 3000,
+    };
+    ```
+   如果您注册的是Bing Customer Search的API，您可以修改您的配置文件为如下，并且填写您的Custom Configuration ID:
+    ```diff
+    export default {
+        LOG_LEVEL: 'debug',
+        BROWSER_TIMEOUT: 10000,
+        BING_SEARCH_API_URL: 'https://api.bing.microsoft.com/v7.0/custom/',
+        BING_SEARCH_API_KEY: 'YOUR_BING_SEARCH_API_KEY',
+        CUSTOM_CONFIG_ID :  'YOUR_CUSTOM_CONFIG_ID', //将您的Custom Configuration ID放在此处
+        HOST: 'localhost',
+        PORT: 3000,
+   };
    ```
 2. 文生图功能需要调用 CogView API。修改 `src/tools/config.py`
   ，提供文生图功能需要使用的 [智谱 AI 开放平台](https://open.bigmodel.cn) API Key：
    ```diff
-    --- a/src/tools/config.py
+    BROWSER_SERVER_URL = 'http://localhost:3000'
-    +++ b/src/tools/config.py
-    @@ -2,5 +2,5 @@ BROWSER_SERVER_URL = 'http://localhost:3000'
    IPYKERNEL = 'glm-4-demo'
-    -ZHIPU_AI_KEY = ''
+    ZHIPU_AI_KEY = '<PUT_YOUR_ZHIPU_AI_KEY_HERE>'
-    +ZHIPU_AI_KEY = '<PUT_YOUR_ZHIPU_AI_KEY_HERE>'
    COGVIEW_MODEL = 'cogview-3'
    ```
@@ -82,11 +90,13 @@ pnpm install
 之后即可从命令行中看到 demo 的地址，点击即可访问。初次访问需要下载并加载模型，可能需要花费一定时间。
 如果已经在本地下载了模型，可以通过 `export *_MODEL_PATH=/path/to/model` 来指定从本地加载模型。可以指定的模型包括：
- `CHAT_MODEL_PATH`: 用于 All Tools 模式与文档解读模式，默认为 `THUDM/glm-4-9b-chat`。
+- `CHAT_MODEL_PATH`: 用于 All Tools 模式与文档解读模式，默认为 `THUDM/glm-4-9b-chatglm-4-9b-chat`。
 - `VLM_MODEL_PATH`: 用于 VLM 模式，默认为 `THUDM/glm-4v-9b`。
 Chat 模型支持使用 [vLLM](https://github.com/vllm-project/vllm) 推理。若要使用，请安装 vLLM 并设置环境变量 `USE_VLLM=1`。
+Chat 模型支持使用 [OpenAI API](https://platform.openai.com/docs/api-reference/introduction) 推理。若要使用，请启动basic_demo目录下的openai_api_server并设置环境变量 `USE_API=1`。该功能可以解耦推理服务器和demo服务器。
 如果需要自定义 Jupyter 内核，可以通过 `export IPYKERNEL=<kernel_name>` 来指定。
 ## 使用

--- a/composite_demo/README_en.md
+++ b/composite_demo/README_en.md
@@ -42,31 +42,26 @@ pnpm install
   needs to use:
 ```diff
--- a/browser/src/config.ts
+export default {
-+++ b/browser/src/config.ts
-@@ -3,7 +3,7 @@ export default {
-BROWSER_TIMEOUT: 10000,
+   BROWSER_TIMEOUT: 10000,
-BING_SEARCH_API_URL: 'https://api.bing.microsoft.com/v7.0',
+   BING_SEARCH_API_URL: 'https://api.bing.microsoft.com/v7.0',
- BING_SEARCH_API_KEY: '',
+   BING_SEARCH_API_KEY: '<PUT_YOUR_BING_SEARCH_KEY_HERE>',
-+ BING_SEARCH_API_KEY: '<PUT_YOUR_BING_SEARCH_KEY_HERE>',
-HOST: 'localhost',
+   HOST: 'localhost',
-PORT: 3000,
+   PORT: 3000,
+};
 ```
 2. The Wenshengtu function needs to call the CogView API. Modify `src/tools/config.py`
   , provide the [Zhipu AI Open Platform](https://open.bigmodel.cn) API Key required for the Wenshengtu function:
 ```diff
--- a/src/tools/config.py
+BROWSER_SERVER_URL = 'http://localhost:3000'
-+++ b/src/tools/config.py
-@@ -2,5 +2,5 @@ BROWSER_SERVER_URL = 'http://localhost:3000'
 IPYKERNEL = 'glm4-demo'
-ZHIPU_AI_KEY = ''
+ZHIPU_AI_KEY = '<PUT_YOUR_ZHIPU_AI_KEY_HERE>'
-+ZHIPU_AI_KEY = '<PUT_YOUR_ZHIPU_AI_KEY_HERE>'
 COGVIEW_MODEL = 'cogview-3'
 ```
@@ -96,6 +91,8 @@ by `export *_MODEL_PATH=/path/to/model`. The models that can be specified includ
 The Chat model supports reasoning using [vLLM](https://github.com/vllm-project/vllm). To use it, please install vLLM and
 set the environment variable `USE_VLLM=1`.
+The Chat model also supports reasoning using [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). To use it, please run `openai_api_server.py` in `inference` and set the environment variable `USE_API=1`. This function is used to deploy inference server and demo server in different machine.
 If you need to customize the Jupyter kernel, you can specify it by `export IPYKERNEL=<kernel_name>`.
 ## Usage
@@ -141,7 +138,7 @@ Users can upload documents and use the long text capability of GLM-4-9B to under
 pdf and other files.
 + Tool calls and system prompt words are not supported in this mode.
-+ If the text is very long, the model may require a high amount of video memory. Please confirm your hardware
+ If the text is very long, the model may require a high amount of GPU memory. Please confirm your hardware
  configuration.
 ## Image Understanding Mode

--- a/composite_demo/assets/cogview.png
+++ b/composite_demo/assets/cogview.png
--- a/composite_demo/assets/demo.png
+++ b/composite_demo/assets/demo.png
--- a/composite_demo/assets/doc_reader.png
+++ b/composite_demo/assets/doc_reader.png
--- a/composite_demo/assets/tool.png
+++ b/composite_demo/assets/tool.png
--- a/composite_demo/assets/vlm.png
+++ b/composite_demo/assets/vlm.png
--- a/composite_demo/assets/weather.png
+++ b/composite_demo/assets/weather.png
--- a/composite_demo/assets/web_plot_1.png
+++ b/composite_demo/assets/web_plot_1.png
--- a/composite_demo/assets/web_plot_2.png
+++ b/composite_demo/assets/web_plot_2.png
--- a/composite_demo/browser/.gitignore
+++ b/composite_demo/browser/.gitignore
--- a/composite_demo/browser/package-lock.json
+++ b/composite_demo/browser/package-lock.json
--- a/composite_demo/browser/package.json
+++ b/composite_demo/browser/package.json