Update DEPLOYMENT_en.md, MODEL_LICENSE, PROMPT.md, PROMPT_en.md, README.md,...

Update DEPLOYMENT_en.md, MODEL_LICENSE, PROMPT.md, PROMPT_en.md, README.md, README_en.md, requirements.txt, .gitignore, LICENSE, update_requirements.sh files Deleted lvzhen.log, basic_demo/cli_demo.py, basic_demo/cli_demo_bad_word_ids.py, basic_demo/infer_test.py, basic_demo/utils.py, basic_demo/vocab.txt, basic_demo/web_demo.py, basic_demo/web_demo2.py, composite_demo/.streamlit/config.toml, composite_demo/assets/demo.png, composite_demo/assets/emojis.png, composite_demo/assets/heart.png, composite_demo/assets/tool.png, composite_demo/README.md, composite_demo/README_en.md, composite_demo/client.py, composite_demo/conversation.py, composite_demo/demo_chat.py, composite_demo/demo_ci.py, composite_demo/demo_tool.py, composite_demo/main.py, composite_demo/requirements.txt, composite_demo/tool_registry.py, cookbook/data/toutiao_cat_data_example.txt, cookbook/accurate_prompt.ipynb, cookbook/finetune_muti_classfication.ipynb, finetune_basemodel_demo/scripts/finetune_lora.sh, finetune_basemodel_demo/scripts/formate_alpaca2jsonl.py, finetune_basemodel_demo/README.md, finetune_basemodel_demo/arguments.py, finetune_basemodel_demo/finetune.py, finetune_basemodel_demo/inference.py, finetune_basemodel_demo/preprocess_utils.py, finetune_basemodel_demo/requirements.txt, finetune_basemodel_demo/trainer.py, finetune_chatmodel_demo/AdvertiseGen/dev.json, finetune_chatmodel_demo/AdvertiseGen/train.json, finetune_chatmodel_demo/configs/deepspeed.json, finetune_chatmodel_demo/formatted_data/advertise_gen.jsonl, finetune_chatmodel_demo/formatted_data/tool_alpaca.jsonl, finetune_chatmodel_demo/scripts/finetune_ds.sh, finetune_chatmodel_demo/scripts/finetune_ds_multiturn.sh, finetune_chatmodel_demo/scripts/finetune_pt.sh, finetune_chatmodel_demo/scripts/finetune_pt_multiturn.sh, finetune_chatmodel_demo/scripts/format_advertise_gen.py, finetune_chatmodel_demo/scripts/format_tool_alpaca.py, finetune_chatmodel_demo/README.md, finetune_chatmodel_demo/arguments.py, finetune_chatmodel_demo/finetune.py, finetune_chatmodel_demo/inference.py, finetune_chatmodel_demo/preprocess_utils.py, finetune_chatmodel_demo/requirements.txt, finetune_chatmodel_demo/train_data.json, finetune_chatmodel_demo/trainer.py, langchain_demo/Tool/Calculator.py, langchain_demo/Tool/Calculator.yaml, langchain_demo/Tool/Weather.py, langchain_demo/Tool/arxiv_example.yaml, langchain_demo/Tool/weather.yaml, langchain_demo/ChatGLM3.py, langchain_demo/README.md, langchain_demo/main.py, langchain_demo/requirements.txt, langchain_demo/utils.py, media/GLM.png, media/cli.png, media/transformers.jpg, openai_api_demo/openai_api.py, openai_api_demo/openai_api_request.py, openai_api_demo/requirements.txt, openai_api_demo/utils.py, resources/WECHAT.md, resources/cli-demo.png, resources/code_en.gif, resources/heart.png, resources/tool.png, resources/tool_en.png, resources/web-demo.gif, resources/web-demo2.gif, resources/web-demo2.png, resources/wechat.jpg, tool_using/README.md, tool_using/README_en.md, tool_using/cli_demo_tool.py, tool_using/openai_api_demo.py, tool_using/requirements.txt, tool_using/test.py, tool_using/tool_register.py, Dockerfile, README_old.md, model.properties files

Update DEPLOYMENT_en.md, MODEL_LICENSE, PROMPT.md, PROMPT_en.md, README.md,...
Update DEPLOYMENT_en.md, MODEL_LICENSE, PROMPT.md, PROMPT_en.md, README.md, README_en.md, requirements.txt, .gitignore, LICENSE, update_requirements.sh files Deleted lvzhen.log, basic_demo/cli_demo.py, basic_demo/cli_demo_bad_word_ids.py, basic_demo/infer_test.py, basic_demo/utils.py, basic_demo/vocab.txt, basic_demo/web_demo.py, basic_demo/web_demo2.py, composite_demo/.streamlit/config.toml, composite_demo/assets/demo.png, composite_demo/assets/emojis.png, composite_demo/assets/heart.png, composite_demo/assets/tool.png, composite_demo/README.md, composite_demo/README_en.md, composite_demo/client.py, composite_demo/conversation.py, composite_demo/demo_chat.py, composite_demo/demo_ci.py, composite_demo/demo_tool.py, composite_demo/main.py, composite_demo/requirements.txt, composite_demo/tool_registry.py, cookbook/data/toutiao_cat_data_example.txt, cookbook/accurate_prompt.ipynb, cookbook/finetune_muti_classfication.ipynb, finetune_basemodel_demo/scripts/finetune_lora.sh, finetune_basemodel_demo/scripts/formate_alpaca2jsonl.py, finetune_basemodel_demo/README.md, finetune_basemodel_demo/arguments.py, finetune_basemodel_demo/finetune.py, finetune_basemodel_demo/inference.py, finetune_basemodel_demo/preprocess_utils.py, finetune_basemodel_demo/requirements.txt, finetune_basemodel_demo/trainer.py, finetune_chatmodel_demo/AdvertiseGen/dev.json, finetune_chatmodel_demo/AdvertiseGen/train.json, finetune_chatmodel_demo/configs/deepspeed.json, finetune_chatmodel_demo/formatted_data/advertise_gen.jsonl, finetune_chatmodel_demo/formatted_data/tool_alpaca.jsonl, finetune_chatmodel_demo/scripts/finetune_ds.sh, finetune_chatmodel_demo/scripts/finetune_ds_multiturn.sh, finetune_chatmodel_demo/scripts/finetune_pt.sh, finetune_chatmodel_demo/scripts/finetune_pt_multiturn.sh, finetune_chatmodel_demo/scripts/format_advertise_gen.py, finetune_chatmodel_demo/scripts/format_tool_alpaca.py, finetune_chatmodel_demo/README.md, finetune_chatmodel_demo/arguments.py, finetune_chatmodel_demo/finetune.py, finetune_chatmodel_demo/inference.py, finetune_chatmodel_demo/preprocess_utils.py, finetune_chatmodel_demo/requirements.txt, finetune_chatmodel_demo/train_data.json, finetune_chatmodel_demo/trainer.py, langchain_demo/Tool/Calculator.py, langchain_demo/Tool/Calculator.yaml, langchain_demo/Tool/Weather.py, langchain_demo/Tool/arxiv_example.yaml, langchain_demo/Tool/weather.yaml, langchain_demo/ChatGLM3.py, langchain_demo/README.md, langchain_demo/main.py, langchain_demo/requirements.txt, langchain_demo/utils.py, media/GLM.png, media/cli.png, media/transformers.jpg, openai_api_demo/openai_api.py, openai_api_demo/openai_api_request.py, openai_api_demo/requirements.txt, openai_api_demo/utils.py, resources/WECHAT.md, resources/cli-demo.png, resources/code_en.gif, resources/heart.png, resources/tool.png, resources/tool_en.png, resources/web-demo.gif, resources/web-demo2.gif, resources/web-demo2.png, resources/wechat.jpg, tool_using/README.md, tool_using/README_en.md, tool_using/cli_demo_tool.py, tool_using/openai_api_demo.py, tool_using/requirements.txt, tool_using/test.py, tool_using/tool_register.py, Dockerfile, README_old.md, model.properties files
dfe42ac1 · lvzhen · d7be7b1c · dfe42ac1 · d7be7b1c · dfe42ac1
Commit dfe42ac1 authored May 10, 2024 by lvzhen
20 changed files
--- a/.gitignore
+++ b/.gitignore
+__pycache__
+# finetune_demo: generated & downloaded files
+finetune_demo/output
+finetune_demo/data
+finetune_demo/formatted_data
+ToolAlpaca/
+AdvertiseGen/
+*.gz
+*.idea
+.DS_Store
\ No newline at end of file
--- a/Dockerfile
+++ b/Dockerfile
-FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04-py38-latest
-COPY requirements.txt requirements.txt
-RUN source /opt/dtk-23.04/env.sh
-RUN cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo 'Asia/Shanghai' >/etc/timezone 
-ENV LANG C.UTF-8
-RUN pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
--- a/LICENSE
+++ b/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright 2024 ChatGLM team @ Zhipu AI
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/PROMPT.md
+++ b/PROMPT.md
@@ -132,6 +132,7 @@ data[:5]
 <|assistant|>
 该文件看起来包含有关某些条目的元数据，每个条目有以下字段：
 - `file_name`: 文件名称
 - `name`: 名称
 - `type`: 类型（例如 "survivor" 或 "killer"）

--- a/PROMPT_en.md
+++ b/PROMPT_en.md
@@ -29,7 +29,7 @@ Where `<|role|>` part is represented in a special token,  which can’t be encod
 ### Example Scenarios
-For better readablity, an extra `\n` is added before each role special token. This extra `\n` should not be added in actual use and tokenizer implementation.
+For better readability, an extra `\n` is added before each role special token. This extra `\n` should not be added in actual use and tokenizer implementation.
 #### Multi-turn Dialogue
 * There are only three roles: `<|user|>`, `<|assistant|>`, and `<|system|>`.

--- a/README.md
+++ b/README.md
--- a/README_en.md
+++ b/README_en.md
@@ -10,15 +10,36 @@
 📍Experience the larger-scale ChatGLM model at <a href="https://www.chatglm.cn">chatglm.cn</a>
 </p>
-## Introduction
+📔
+About `ChatGLM3-6B`
+For more detailed usage information, please refer to: 
+ [ChatGLM3 technical documentation](https://lslfd0slxc.feishu.cn/wiki/WvQbwIJ9tiPAxGk8ywDck6yfnof?from=from_copylink)
+ [Bilibili video](https://www.bilibili.com/video/BV1uC4y1J7yA)
+ [YouTube video](https://www.youtube.com/watch?v=Pw9PB6R7ORA)
-ChatGLM3 is a new generation of pre-trained dialogue models jointly released by Zhipu AI and Tsinghua KEG. ChatGLM3-6B is the open-source model in the ChatGLM3 series, maintaining many excellent features of the first two generations such as smooth dialogue and low deployment threshold, while introducing the following features:
+## GLM-4 Introduction
+We have released the latest **GLM-4** model, which has made new breakthroughs in multiple indicators. You can directly experience our latest model in the following two channels.
+ [Chatglm Qingyan](https://www.chatglm.cn) To experience the latest version of GLM-4, including **GLM, all tools** and other functions, download the Zhipu Qingyan APP
+  Or use [web page](https://www.chatglm.cn).
+ [API Platform](https://open.bigmodel.cn/) The new generation API platform has been launched. You can directly access the API
+  Experience new models such as `GLM-4`, `GLM-3-Turbo`, `CharaterGLM-3`, and `CogView-3` on the platform.
+  Among them, two models, `GLM-4` and `GLM-3-Turbo`, support new functions such as `system prompt`, `function call`, `retrieval`, `Web_Search`, etc. Welcome to experience it.
+ [GLM4 API Open Source Tutorial](https://github.com/MetaGLM/glm-cookbook/) - A tutorial and basic application guide for the GLM-4 API. You are invited to explore and experiment.
+  For API-related inquiries, refer to this open-source tutorial, or utilize the [GLM-4 API AI Assistant](https://open.bigmodel.cn/shareapp/v1/?share_code=sQwt5qyqYVaNh1O_87p8O) for assistance with common questions.
+-----
+## ChatGLM3 Introduction
+**ChatGLM3** is a generation of pre-trained dialogue models jointly released by Zhipu AI and Tsinghua KEG. ChatGLM3-6B is the open-source model in the ChatGLM3 series, maintaining many excellent features of the first two generations such as smooth dialogue and low deployment threshold, while introducing the following features:
 1. **Stronger Base Model:** The base model of ChatGLM3-6B, ChatGLM3-6B-Base, adopts a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. Evaluations on datasets from various perspectives such as semantics, mathematics, reasoning, code, and knowledge show that **ChatGLM3-6B-Base has the strongest performance among base models below 10B**.
-2. **More Complete Function Support:** ChatGLM3-6B adopts a newly designed [Prompt format](PROMPT_en.md), supporting multi-turn dialogues as usual. It also natively supports [tool invocation](tool_using/README_en.md) (Function Call), code execution (Code Interpreter), and Agent tasks in complex scenarios.
+2. **More Complete Function Support:** ChatGLM3-6B adopts a newly designed [Prompt format](PROMPT_en.md), supporting multi-turn dialogues as usual. It also natively supports [tool invocation](tools_using_demo/README_en.md) (Function Call), code execution (Code Interpreter), and Agent tasks in complex scenarios.
-3. **More Comprehensive Open-source Series:** In addition to the dialogue model [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b), the basic model [ChatGLM3-6B-Base](https://huggingface.co/THUDM/chatglm3-6b-base), and the long-text dialogue model [ChatGLM3-6B-32K](https://huggingface.co/THUDM/chatglm3-6b-32k) have also been open-sourced. All these weights are **fully open** for academic research, and **free commercial use is also allowed** after registration via a [questionnaire](https://open.bigmodel.cn/mla/form).
+3. **More Comprehensive Open-source Series:** In addition to the dialogue model [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b), the basic model [ChatGLM3-6B-Base](https://huggingface.co/THUDM/chatglm3-6b-base), the long-text dialogue model [ChatGLM3-6B-32K](https://huggingface.co/THUDM/chatglm3-6b-32k) and further strengthens the ability to understand long texts [ChatGLM3-6B-128K](https://huggingface.co/THUDM/chatglm3-6b-128k) have also been open-sourced. All these weights are **fully open** for academic research, and **free commercial use is also allowed** after registration via a [questionnaire](https://open.bigmodel.cn/mla/form).
 -----
@@ -28,17 +49,31 @@ Although every effort has been made to ensure the compliance and accuracy of the
 ## Model List
-| Model | Seq Length |                                                              Download                                                               
+|      Model       | Seq Length |                                                                              Download                                                                              
-| :---: |:---------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------:
+|:----------------:|:----------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:
-| ChatGLM3-6B | 8k |      [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b)      
+|   ChatGLM3-6B    |     8k     |                     [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b)                      
-| ChatGLM3-6B-Base | 8k | [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b-base) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base) 
+| ChatGLM3-6B-Base |     8k     |                [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b-base) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base)                 
-| ChatGLM3-6B-32K | 32k |                                   [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b-32k) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k)                                    
+| ChatGLM3-6B-32K  |    32k     |                 [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b-32k) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k)                  
+| ChatGLM3-6B-128K |    128k    |                 [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b-128k) ｜ [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-128k)                 
 ## Projects
-Open source projects that accelerate ChatGLM3:
+The following excellent open source repositories have in-depth support for the ChatGLM3-6B model, and everyone is welcome to expand their learning.
+Inference acceleration:
 * [chatglm.cpp](https://github.com/li-plus/chatglm.cpp): Real-time inference on your laptop accelerated by quantization, similar to llama.cpp.
 * [ChatGLM3-TPU](https://github.com/sophgo/ChatGLM3-TPU): Using the TPU accelerated inference solution, it runs about 7.5 token/s in real time on the end-side chip BM1684X (16T@FP16, 16G DDR).
+* [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/tree/main): A high-performance GPU-accelerated inference solution developed by NVIDIA, you can refer to these [steps](./tensorrt_llm_demo/README.md) to deploy ChatGLM3.
+* [OpenVINO](https://github.com/openvinotoolkit): A high-performance CPU and GPU accelerated inference solution developed by Intel, you can refer to this [step](./Intel_device_demo/openvino_demo/README.md) to deploy the ChatGLM3-6B model
+Efficient fine-tuning:
+* [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory): An excellent, easy-to-use and efficient fine-tuning framework.
+Application framework:
+* [LangChain-Chatchat](https://github.com/chatchat-space/Langchain-Chatchat): Based on large language models such as ChatGLM and application frameworks such as Langchain, open source and offline deployable retrieval enhancement generation (RAG) large Model knowledge base project.
+* [BISHENG](https://github.com/dataelement/bisheng): open-source platform for developing LLM applications. It empowers and accelerates the development of LLM applications and helps users to enter the next generation of application development mode with the best experience.
 ## Evaluation Results
 ### Typical Tasks
@@ -75,10 +110,7 @@ Then use pip to install the dependencies:
 ```
 pip install -r requirements.txt
 ```
-+ The `transformers` library version should be `4.30.2` and above, and `torch` library should be 2.0 and above to obtain the best inference performance.
 + In order to ensure that the version of `torch` is correct, please strictly follow the instructions of [official documentation](https://pytorch.org/get-started/locally/) for installation.
-+ The `gradio` library version should be the `3.x` version.
 ### Integrated Demo
@@ -128,21 +160,21 @@ git clone https://huggingface.co/THUDM/chatglm3-6b
 If the download from HuggingFace is slow, you can also download it from [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b).
 # Model Fine-tuning
-Please refer to the dialog model fine-tuning [ChatGLM3-6B fine-tuning example](finetune_chatmodel_demo/README.md), or the base model fine-tuning [ChatGLM3-6B-base fine-tuning example](finetune_basemodel_demo/README.md).
-Please note that different fine-tuning scripts correspond to different models. Please select the corresponding model according to your needs.
+We provide a basic fine-tuning framework for ChatGLM3-6B. You can use it to fine-tune the model on your own dataset. For more details, please refer to [Fine-tuning Demo](finetune_demo/README_en.md).
 ### Web-based Dialogue Demo
 ![web-demo](resources/web-demo.gif)
 You can launch a web-based demo using Gradio with the following command:
 ```shell
-python web_demo.py
+python web_demo_gradio.py
 ```
 ![web-demo](resources/web-demo2.png)
 You can launch a web-based demo using Streamlit with the following command:
 ```shell
-streamlit run web_demo2.py
+streamlit run web_demo_streamlit.py
 ```
 The web-based demo will run a Web Server and output an address. You can use it by opening the output address in a browser. Based on tests, the web-based demo using Streamlit runs more smoothly.
@@ -159,19 +191,34 @@ python cli_demo.py
 The program will interact in the command line, enter instructions in the command line and hit enter to generate a response. Enter `clear` to clear the dialogue history, enter `stop` to terminate the program.
-### API Deployment
+### OpenAI API /Zhipu API Demo 
-Thanks to [@xusenlinzy](https://github.com/xusenlinzy) for implementing the OpenAI format streaming API deployment, which can serve as the backend for any ChatGPT-based application, such as [ChatGPT-Next-Web](https://github.com/Yidadaa/ChatGPT-Next-Web). You can deploy it by running [openai_api.py](openai_api_demo/openai_api.py) in the repository:
+We have launched open source model API deployment code in OpenAI / ZhipuAI format, which can be used as the backend of any ChatGPT-based application.
+Currently, you can deploy by running [api_server.py](openai_api_demo/api_server.py) in the warehouse
 ```shell
 cd openai_api_demo
-python openai_api.py
+python api_server.py
 ```
-Also, we have written a sample code to test the performance of the API calls. This can be tested by running [openai_api_request.py](openai_api_demo/openai_api_request.py) in the repository
+At the same time, we also wrote a sample code to test the performance of API calls.
+ OpenAI test script: [openai_api_request.py](openai_api_demo/openai_api_request.py)
+ ZhipuAI test script: [zhipu_api_request.py](openai_api_demo/zhipu_api_request.py)
 + Test with Curl
+ chat Curl test
+```shell
+curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
+-H "Content-Type: application/json" \
+-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"system\", \"content\": \"You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.\"}, {\"role\": \"user\", \"content\": \"你好，给我讲一个故事，大概100字\"}], \"stream\": false, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
+````
+ agent-chat Curl test
 ```shell
 curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \\
+-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"system\", \"content\": \"You are ChatGLM3, a large language model trained by Zhipu. Follow the user's instructions carefully. Respond using markdown.\"}, {\"role\": \"user\", \"content\": \"Hello, tell me a story, about 100 words\"}], \"stream\": false, \"max_title": \"\". false, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
+-d "{\"model\": \"chatglm3-6b\", \"agent\": true, \"messages\": [{\"role\": \"user\", \"content\": \"37乘以8加7除2等于多少？\"}], \"stream\": true, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
 ````
 + Testing with Python
 ```shell
 cd openai_api_demo
@@ -181,7 +228,7 @@ If the test is successful, the model should return a story.
 ### Tool Invocation
-For methods of tool invocation, please refer to [Tool Invocation](tool_using/README_en.md).
+For methods of tool invocation, please refer to [Tool Invocation](tools_using_demo/README_en.md).
 ## Low-Cost Deployment
@@ -217,15 +264,18 @@ Loading the half-precision ChatGLM3-6B model requires about 13GB of memory. Mach
 ### Multi-GPU Deployment
-If you have multiple GPUs, but each GPU's VRAM size is not enough to accommodate the complete model, then the model can be split across multiple GPUs. First, install accelerate: `pip install accelerate`, and then load the model through the following methods:
+If you have multiple GPUs, but each GPU's VRAM size is not enough to accommodate the complete model, then the model can be split across multiple GPUs. First, install accelerate: `pip install accelerate`, and then load the model as usual.
-```python
-from utils import load_model_on_gpus
-model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
+### OpenVINO Demo
-```
+ChatGLM3-6B already supports the use of OpenVINO
+The toolkit accelerates inference and has a greater inference speed improvement on Intel's GPUs and GPU devices. For specific usage, please refer to [OpenVINO Demo](Intel_device_demo/openvino_demo/README.md).
+### TensorRT-LLM Demo
-This allows the model to be deployed on two GPUs for inference. You can change `num_gpus` to the number of GPUs you want to use. It is evenly split by default, but you can also pass the `device_map` parameter to specify it yourself.
+ChatGLM3-6B now supports accelerated inference using the TensorRT-LLM toolkit, significantly improving model inference speed. For specific usage, please refer to the [TensorRT-LLM Demo](tensorrt_llm_demo/tensorrt_llm_cli_demo.py) and the official technical documentation.
 ## Citation

--- a/README_old.md
+++ b/README_old.md
-# ChatGLM3
-<p align="center">
-🤗 <a href="https://huggingface.co/THUDM/chatglm3-6b" target="_blank">HF Repo</a> • 🤖 <a href="https://modelscope.cn/models/ZhipuAI/chatglm3-6b" target="_blank">ModelScope</a> • 🐦 <a href="https://twitter.com/thukeg" target="_blank">Twitter</a> • 📃 <a href="https://arxiv.org/abs/2103.10360" target="_blank">[GLM@ACL 22]</a> <a href="https://github.com/THUDM/GLM" target="_blank">[GitHub]</a> • 📃 <a href="https://arxiv.org/abs/2210.02414" target="_blank">[GLM-130B@ICLR 23]</a> <a href="https://github.com/THUDM/GLM-130B" target="_blank">[GitHub]</a> <br>
-</p>
-<p align="center">
-    👋 加入我们的 <a href="https://join.slack.com/t/chatglm/shared_invite/zt-25ti5uohv-A_hs~am_D3Q8XPZMpj7wwQ" target="_blank">Slack</a> 和 <a href="resources/WECHAT.md" target="_blank">微信</a>
-</p>
-<p align="center">
-📍在 <a href="https://www.chatglm.cn">chatglm.cn</a> 体验更大规模的 ChatGLM 模型。
-</p>
-[Read this in English.](./README_en.md)
-📔 更为详细的使用信息，可以参考：[ChatGLM3技术文档](https://lslfd0slxc.feishu.cn/wiki/WvQbwIJ9tiPAxGk8ywDck6yfnof?from=from_copylink)
-## 介绍
-ChatGLM3 是智谱AI和清华大学 KEG 实验室联合发布的新一代对话预训练模型。ChatGLM3-6B 是 ChatGLM3 系列中的开源模型，在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上，ChatGLM3-6B 引入了如下特性：
-1. **更强大的基础模型：** ChatGLM3-6B 的基础模型 ChatGLM3-6B-Base 采用了更多样的训练数据、更充分的训练步数和更合理的训练策略。在语义、数学、推理、代码、知识等不同角度的数据集上测评显示，**ChatGLM3-6B-Base 具有在 10B 以下的基础模型中最强的性能**。
-2. **更完整的功能支持：** ChatGLM3-6B 采用了全新设计的 [Prompt 格式](PROMPT.md)，除正常的多轮对话外。同时原生支持[工具调用](tool_using/README.md)（Function Call）、代码执行（Code Interpreter）和 Agent 任务等复杂场景。
-3. **更全面的开源序列：** 除了对话模型 [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b) 外，还开源了基础模型 [ChatGLM3-6B-Base](https://huggingface.co/THUDM/chatglm3-6b-base)、长文本对话模型 [ChatGLM3-6B-32K](https://huggingface.co/THUDM/chatglm3-6b-32k)。以上所有权重对学术研究**完全开放**，在填写[问卷](https://open.bigmodel.cn/mla/form)进行登记后**亦允许免费商业使用**。
-----
-ChatGLM3 开源模型旨在与开源社区一起推动大模型技术发展，恳请开发者和大家遵守[开源协议](MODEL_LICENSE)，勿将开源模型和代码及基于开源项目产生的衍生物用于任何可能给国家和社会带来危害的用途以及用于任何未经过安全评估和备案的服务。目前，本项目团队未基于 **ChatGLM3 开源模型**开发任何应用，包括网页端、安卓、苹果 iOS 及 Windows App 等应用。
-尽管模型在训练的各个阶段都尽力确保数据的合规性和准确性，但由于 ChatGLM3-6B 模型规模较小，且模型受概率随机性因素影响，无法保证输出内容的准确。同时模型的输出容易被用户的输入误导。**本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。**
-## 模型列表
-| Model | Seq Length |                                                              Download                                                               
-| :---: |:---------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------:
-| ChatGLM3-6B | 8k |      [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b)      
-| ChatGLM3-6B-Base | 8k | [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b-base) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base) 
-| ChatGLM3-6B-32K | 32k |                                   [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b-32k) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k)                                    
-## 友情链接
-对 ChatGLM3 进行加速的开源项目：
-* [chatglm.cpp](https://github.com/li-plus/chatglm.cpp): 类似 llama.cpp 的量化加速推理方案，实现笔记本上实时对话
-* [ChatGLM3-TPU](https://github.com/sophgo/ChatGLM3-TPU): 采用TPU加速推理方案，在算能端侧芯片BM1684X（16T@FP16，内存16G）上实时运行约7.5 token/s
-## 评测结果
-### 典型任务
-我们选取了 8 个中英文典型数据集，在 ChatGLM3-6B (base) 版本上进行了性能测试。
-| Model            | GSM8K | MATH | BBH  | MMLU | C-Eval | CMMLU | MBPP | AGIEval |
-|------------------|:-----:|:----:|:----:|:----:|:------:|:-----:|:----:|:-------:|
-| ChatGLM2-6B-Base | 32.4  | 6.5  | 33.7 | 47.9 |  51.7  | 50.0  |  -   |    -    |
-| Best Baseline    | 52.1  | 13.1 | 45.0 | 60.1 |  63.5  | 62.2  | 47.5 |  45.8   
-| ChatGLM3-6B-Base | 72.3  | 25.7 | 66.1 | 61.4 |  69.0  | 67.5  | 52.4 |  53.7   |
-> Best Baseline 指的是截止 2023年10月27日、模型参数在 10B 以下、在对应数据集上表现最好的预训练模型，不包括只针对某一项任务训练而未保持通用能力的模型。
-> 对 ChatGLM3-6B-Base 的测试中，BBH 采用 3-shot 测试，需要推理的 GSM8K、MATH 采用 0-shot CoT 测试，MBPP 采用 0-shot 生成后运行测例计算 Pass@1 ，其他选择题类型数据集均采用 0-shot 测试。
-我们在多个长文本应用场景下对 ChatGLM3-6B-32K 进行了人工评估测试。与二代模型相比，其效果平均提升了超过 50%。在论文阅读、文档摘要和财报分析等应用中，这种提升尤为显著。此外，我们还在 LongBench 评测集上对模型进行了测试，具体结果如下表所示
-| Model                |  平均 |  Summary | Single-Doc QA |  Multi-Doc QA | Code | Few-shot | Synthetic | 
-|----------------------|:-----:|:----:|:----:|:----:|:------:|:-----:|:-----:|
-| ChatGLM2-6B-32K   |  41.5 | 24.8 | 37.6 | 34.7 |  52.8  |  51.3 | 47.7 | 
-| ChatGLM3-6B-32K   |  50.2 | 26.6 | 45.8 | 46.1 |  56.2  |  61.2 | 65 |
-## 使用方式
-### 环境安装
-首先需要下载本仓库：
-```shell
-git clone https://github.com/THUDM/ChatGLM3
-cd ChatGLM3
-```
-然后使用 pip 安装依赖：
-```
-pip install -r requirements.txt
-```
-+ `transformers` 库版本应该 `4.30.2` 以及以上的版本 ，`torch` 库版本应为 2.0 及以上的版本，以获得最佳的推理性能。
-+ 为了保证 `torch` 的版本正确，请严格按照 [官方文档](https://pytorch.org/get-started/locally/) 的说明安装。
-+ `gradio` 库版本应该为 `3.x` 的版本。
-### 综合 Demo
-我们提供了一个集成以下三种功能的综合 Demo，运行方法请参考 [综合 Demo](composite_demo/README.md)
- Chat: 对话模式，在此模式下可以与模型进行对话。
- Tool: 工具模式，模型除了对话外，还可以通过工具进行其他操作。
-    <img src="resources/tool.png" width="400">
- Code Interpreter: 代码解释器模式，模型可以在一个 Jupyter 环境中执行代码并获取结果，以完成复杂任务。
-    <img src="resources/heart.png" width="400">
-### 代码调用 
-可以通过如下代码调用 ChatGLM 模型来生成对话：
-```python
->>> from transformers import AutoTokenizer, AutoModel
->>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
->>> model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
->>> model = model.eval()
->>> response, history = model.chat(tokenizer, "你好", history=[])
->>> print(response)
-你好👋!我是人工智能助手 ChatGLM3-6B,很高兴见到你,欢迎问我任何问题。
->>> response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
->>> print(response)
-晚上睡不着可能会让你感到焦虑或不舒服,但以下是一些可以帮助你入睡的方法:
-1. 制定规律的睡眠时间表:保持规律的睡眠时间表可以帮助你建立健康的睡眠习惯,使你更容易入睡。尽量在每天的相同时间上床,并在同一时间起床。
-2. 创造一个舒适的睡眠环境:确保睡眠环境舒适,安静,黑暗且温度适宜。可以使用舒适的床上用品,并保持房间通风。
-3. 放松身心:在睡前做些放松的活动,例如泡个热水澡,听些轻柔的音乐,阅读一些有趣的书籍等,有助于缓解紧张和焦虑,使你更容易入睡。
-4. 避免饮用含有咖啡因的饮料:咖啡因是一种刺激性物质,会影响你的睡眠质量。尽量避免在睡前饮用含有咖啡因的饮料,例如咖啡,茶和可乐。
-5. 避免在床上做与睡眠无关的事情:在床上做些与睡眠无关的事情,例如看电影,玩游戏或工作等,可能会干扰你的睡眠。
-6. 尝试呼吸技巧:深呼吸是一种放松技巧,可以帮助你缓解紧张和焦虑,使你更容易入睡。试着慢慢吸气,保持几秒钟,然后缓慢呼气。
-如果这些方法无法帮助你入睡,你可以考虑咨询医生或睡眠专家,寻求进一步的建议。
-```
-#### 从本地加载模型
-以上代码会由 `transformers` 自动下载模型实现和参数。完整的模型实现在 [Hugging Face Hub](https://huggingface.co/THUDM/chatglm3-6b)。如果你的网络环境较差，下载模型参数可能会花费较长时间甚至失败。此时可以先将模型下载到本地，然后从本地加载。
-从 Hugging Face Hub 下载模型需要先[安装Git LFS](https://docs.github.com/zh/repositories/working-with-files/managing-large-files/installing-git-large-file-storage)，然后运行
-```Shell
-git clone https://huggingface.co/THUDM/chatglm3-6b
-```
-如果从你从 HuggingFace 下载比较慢，也可以从 [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b) 
-中下载。
-### 模型微调
-请参考对话模型微调 [ChatGLM3-6B 微调示例](finetune_chatmodel_demo/README.md),或基座模型微调 [ChatGLM3-6B-base 微调示例](finetune_basemodel_demo/README.md)。
-请注意，不同的微调脚本对应的模型并不相同，请根据需要选择对应的模型。
-### 网页版对话 Demo
-![web-demo](resources/web-demo.gif)
-可以通过以下命令启动基于 Gradio 的网页版 demo：
-```shell
-python web_demo.py
-```
-![web-demo](resources/web-demo2.png)
-可以通过以下命令启动基于 Streamlit 的网页版 demo：
-```shell
-streamlit run web_demo2.py
-```
-网页版 demo 会运行一个 Web Server，并输出地址。在浏览器中打开输出的地址即可使用。 经测试，基于 Streamlit 的网页版 Demo 会更流畅。
-### 命令行对话 Demo
-![cli-demo](resources/cli-demo.png)
-运行仓库中 [cli_demo.py](basic_demo/cli_demo.py)：
-```shell
-python cli_demo.py
-```
-程序会在命令行中进行交互式的对话，在命令行中输入指示并回车即可生成回复，输入 `clear` 可以清空对话历史，输入 `stop` 终止程序。
-### LangChain Demo
-请参考 [基于 LangChain 的工具调用 Demo](langchain_demo/README.md)。
-### 工具调用
-关于工具调用的方法请参考 [工具调用](tool_using/README.md)。 
-### API 部署
-感谢 [@xusenlinzy](https://github.com/xusenlinzy) 实现了 OpenAI 格式的流式 API 部署，可以作为任意基于 ChatGPT 的应用的后端，比如 [ChatGPT-Next-Web](https://github.com/Yidadaa/ChatGPT-Next-Web)。可以通过运行仓库中的[openai_api.py](openai_api_demo/openai_api.py) 进行部署：
-```shell
-cd openai_api_demo
-python openai_api.py
-```
-同时，我们也书写了一个示例代码，用来测试API调用的性能。可以通过运行仓库中的[openai_api_request.py](openai_api_demo/openai_api_request.py) 进行测试
-+ 使用Curl进行测试
-```shell
-curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"system\", \"content\": \"You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.\"}, {\"role\": \"user\", \"content\": \"你好，给我讲一个故事，大概100字\"}], \"stream\": false, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
-````
-+ 使用Python进行测试
-```shell
-cd openai_api_demo
-python openai_api_request.py
-```
-如果测试成功，则模型应该返回一段故事。
-## 低成本部署
-### 模型量化
-默认情况下，模型以 FP16 精度加载，运行上述代码需要大概 13GB 显存。如果你的 GPU 显存有限，可以尝试以量化方式加载模型，使用方法如下：
-```python
-model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True).quantize(4).cuda()
-```
-模型量化会带来一定的性能损失，经过测试，ChatGLM3-6B 在 4-bit 量化下仍然能够进行自然流畅的生成。
-### CPU 部署
-如果你没有 GPU 硬件的话，也可以在 CPU 上进行推理，但是推理速度会更慢。使用方法如下（需要大概 32GB 内存）
-```python
-model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).float()
-```
-### Mac 部署
-对于搭载了 Apple Silicon 或者 AMD GPU 的 Mac，可以使用 MPS 后端来在 GPU 上运行 ChatGLM3-6B。需要参考 Apple 的 [官方说明](https://developer.apple.com/metal/pytorch) 安装 PyTorch-Nightly（正确的版本号应该是2.x.x.dev2023xxxx，而不是 2.x.x）。
-目前在 MacOS 上只支持[从本地加载模型](README.md#从本地加载模型)。将代码中的模型加载改为从本地加载，并使用 mps 后端：
-```python
-model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')
-```
-加载半精度的 ChatGLM3-6B 模型需要大概 13GB 内存。内存较小的机器（比如 16GB 内存的 MacBook Pro），在空余内存不足的情况下会使用硬盘上的虚拟内存，导致推理速度严重变慢。
-### 多卡部署
-如果你有多张 GPU，但是每张 GPU 的显存大小都不足以容纳完整的模型，那么可以将模型切分在多张GPU上。首先安装 accelerate: `pip install accelerate`，然后通过如下方法加载模型：
-```python
-from utils import load_model_on_gpus
-model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
-```
-即可将模型部署到两张 GPU 上进行推理。你可以将 `num_gpus` 改为你希望使用的 GPU 数。默认是均匀切分的，你也可以传入 `device_map` 参数来自己指定。 
-## 引用
-如果你觉得我们的工作有帮助的话，请考虑引用下列论文。
-```
-@article{zeng2022glm,
-  title={Glm-130b: An open bilingual pre-trained model},
-  author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
-  journal={arXiv preprint arXiv:2210.02414},
-  year={2022}
-}
-```
-```
-@inproceedings{du2022glm,
-  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
-  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
-  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
-  pages={320--335},
-  year={2022}
-}
-```
--- a/basic_demo/cli_demo.py
+++ b/basic_demo/cli_demo.py
-import os
-import platform
-from transformers import AutoTokenizer, AutoModel
-import torch
-MODEL_PATH = os.environ.get('MODEL_PATH', '../../chatglm3-6b')
-TOKENIZER_PATH = os.environ.get("TOKENIZER_PATH", MODEL_PATH)
-DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
-# for Mac Computer like M1
-# You Need Use Pytorch compiled with Metal
-# DEVICE = 'mps'
-# for AMD gpu likes MI100 (Not Official Steady Support yet)
-# You Need Use Pytorch compiled with ROCm
-# DEVICE = 'cuda'
-# for Intel gpu likes A770 (Not Official Steady Support yet)
-# You Need Use Pytorch compiled with oneDNN and install intel-extension-for-pytorch
-# import intel_extension_for_pytorch as ipex
-# DEVICE = 'xpu'
-# for Moore Threads gpu like MTT S80 (Not Official Steady Support yet)
-# You Need Use Pytorch compiled with Musa
-# DEVICE = 'musa'
-tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH, trust_remote_code=True)
-if 'cuda' in DEVICE: # AMD, NVIDIA GPU can use Half Precision
-    model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).to(DEVICE).eval()
-else: # CPU, Intel GPU and other GPU can use Float16 Precision Only
-    model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).float().to(DEVICE).eval()
-os_name = platform.system()
-clear_command = 'cls' if os_name == 'Windows' else 'clear'
-stop_stream = False
-welcome_prompt = "欢迎使用 ChatGLM3-6B 模型，输入内容即可进行对话，clear 清空对话历史，stop 终止程序"
-def build_prompt(history):
-    prompt = welcome_prompt
-    for query, response in history:
-        prompt += f"\n\n用户：{query}"
-        prompt += f"\n\nChatGLM3-6B：{response}"
-    return prompt
-def main():
-    past_key_values, history = None, []
-    global stop_stream
-    print(welcome_prompt)
-    while True:
-        query = input("\n用户：")
-        if query.strip() == "stop":
-            break
-        if query.strip() == "clear":
-            past_key_values, history = None, []
-            os.system(clear_command)
-            print(welcome_prompt)
-            continue
-        print("\nChatGLM：", end="")
-        current_length = 0
-        for response, history, past_key_values in model.stream_chat(tokenizer, query, history=history, top_p=1,
-                                                                    temperature=0.01,
-                                                                    past_key_values=past_key_values,
-                                                                    return_past_key_values=True):
-            if stop_stream:
-                stop_stream = False
-                break
-            else:
-                print(response[current_length:], end="", flush=True)
-                current_length = len(response)
-        print("")
-if __name__ == "__main__":
-    main()
--- a/basic_demo/cli_demo_bad_word_ids.py
+++ b/basic_demo/cli_demo_bad_word_ids.py
-"""
-This script demonstrates how to use the `bad_words_ids` argument to filter out.
-"""
-import os
-import platform
-from transformers import AutoTokenizer, AutoModel
-import torch
-MODEL_PATH = os.environ.get('MODEL_PATH', '../../chatglm3-6b')
-TOKENIZER_PATH = os.environ.get("TOKENIZER_PATH", MODEL_PATH)
-DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
-tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH, trust_remote_code=True)
-if 'cuda' in DEVICE: # AMD, NVIDIA GPU can use Half Precision
-    model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).to(DEVICE).eval()
-else: # CPU, Intel GPU and other GPU can use Float16 Precision Only
-    model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).float().to(DEVICE).eval()
-os_name = platform.system()
-clear_command = 'cls' if os_name == 'Windows' else 'clear'
-stop_stream = False
-welcome_prompt = "欢迎使用 ChatGLM3-6B 模型，输入内容即可进行对话，clear 清空对话历史，stop 终止程序"
-# 定义不希望出现的词汇, 你可以自定义, 在这个例子中，如果模型回答包含 "你好" 或 "ChatGLM"，则会出现这个报错
-# probability tensor contains either `inf`, `nan` or element < 0
-bad_words = ["你好", "ChatGLM"]
-# 将这些词汇转换为token ID列表，每个短语是一个子列表
-bad_word_ids = [tokenizer.encode(bad_word, add_special_tokens=False) for bad_word in bad_words]
-def build_prompt(history):
-    prompt = welcome_prompt
-    for query, response in history:
-        prompt += f"\n\n用户：{query}"
-        prompt += f"\n\nChatGLM3-6B：{response}"
-    return prompt
-def main():
-    past_key_values, history = None, []
-    global stop_stream
-    print(welcome_prompt)
-    while True:
-        query = input("\n用户：")
-        if query.strip().lower() == "stop":
-            break
-        if query.strip().lower() == "clear":
-            past_key_values, history = None, []
-            os.system(clear_command)
-            print(welcome_prompt)
-            continue
-        # Attempt to generate a response
-        try:
-            print("\nChatGLM：", end="")
-            current_length = 0
-            response_generated = False
-            for response, history, past_key_values in model.stream_chat(
-                tokenizer, query, history=history, top_p=1,
-                temperature=0.01,
-                past_key_values=past_key_values,
-                return_past_key_values=True,
-                bad_words_ids=bad_word_ids  # assuming this is implemented correctly
-            ):
-                response_generated = True
-                # Check if the response contains any bad words
-                if any(bad_word in response for bad_word in bad_words):
-                    print("我的回答涉嫌了bad word")
-                    break  # Break the loop if a bad word is detected
-                # Otherwise, print the generated response
-                print(response[current_length:], end="", flush=True)
-                current_length = len(response)
-            if not response_generated:
-                print("没有生成任何回答。")
-        except RuntimeError as e:
-            print(f"生成文本时发生错误：{e}，这可能是涉及到设定的敏感词汇")
-        print("")
-if __name__ == "__main__":
-    main()
\ No newline at end of file
--- a/basic_demo/infer_test.py
+++ b/basic_demo/infer_test.py
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("../../chatglm3-6b", trust_remote_code=True)
-model = AutoModel.from_pretrained("../../chatglm3-6b", trust_remote_code=True, device='cuda')
-model = model.eval()
-response, history = model.chat(tokenizer, "你好", history=[])
-print(response)
-response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
-print(response)
-# print(len(tokenizer))
-# vocab_content = tokenizer.get_vocab()
-# with open("vocab.txt", "w", encoding="utf-8") as f:
-#     for token, index in vocab_content.items():
-#         f.write(f"{token} {index}\n")
\ No newline at end of file
--- a/basic_demo/utils.py
+++ b/basic_demo/utils.py
-import os
-from typing import Dict, Union, Optional
-from torch.nn import Module
-from transformers import AutoModel
-def auto_configure_device_map(num_gpus: int) -> Dict[str, int]:
-    # transformer.word_embeddings 占用1层
-    # transformer.final_layernorm 和 lm_head 占用1层
-    # transformer.layers 占用 28 层
-    # 总共30层分配到num_gpus张卡上
-    num_trans_layers = 28
-    per_gpu_layers = 30 / num_gpus
-    # bugfix: 在linux中调用torch.embedding传入的weight,input不在同一device上,导致RuntimeError
-    # windows下 model.device 会被设置成 transformer.word_embeddings.device
-    # linux下 model.device 会被设置成 lm_head.device
-    # 在调用chat或者stream_chat时,input_ids会被放到model.device上
-    # 如果transformer.word_embeddings.device和model.device不同,则会导致RuntimeError
-    # 因此这里将transformer.word_embeddings,transformer.final_layernorm,lm_head都放到第一张卡上
-    # 本文件来源于https://github.com/THUDM/ChatGLM-6B/blob/main/utils.py
-    # 仅此处做少许修改以支持ChatGLM3
-    device_map = {
-        'transformer.embedding.word_embeddings': 0,
-        'transformer.encoder.final_layernorm': 0,
-        'transformer.output_layer': 0,
-        'transformer.rotary_pos_emb': 0,
-        'lm_head': 0
-    }
-    used = 2
-    gpu_target = 0
-    for i in range(num_trans_layers):
-        if used >= per_gpu_layers:
-            gpu_target += 1
-            used = 0
-        assert gpu_target < num_gpus
-        device_map[f'transformer.encoder.layers.{i}'] = gpu_target
-        used += 1
-    return device_map
-def load_model_on_gpus(checkpoint_path: Union[str, os.PathLike], num_gpus: int = 2,
-                       device_map: Optional[Dict[str, int]] = None, **kwargs) -> Module:
-    if num_gpus < 2 and device_map is None:
-        model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True, **kwargs).half().cuda()
-    else:
-        from accelerate import dispatch_model
-        model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True, **kwargs).half()
-        if device_map is None:
-            device_map = auto_configure_device_map(num_gpus)
-        model = dispatch_model(model, device_map=device_map)
-    return model
\ No newline at end of file
--- a/basic_demo/vocab.txt
+++ b/basic_demo/vocab.txt
--- a/basic_demo/web_demo.py
+++ b/basic_demo/web_demo.py
-import os
-from transformers import AutoModel, AutoTokenizer
-import gradio as gr
-import mdtex2html
-from utils import load_model_on_gpus
-import torch
-MODEL_PATH = os.environ.get('MODEL_PATH', '../../chatglm3-6b')
-TOKENIZER_PATH = os.environ.get("TOKENIZER_PATH", MODEL_PATH)
-DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
-tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH, trust_remote_code=True)
-if 'cuda' in DEVICE: # AMD, NVIDIA GPU can use Half Precision
-    model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).to(DEVICE).eval()
-else: # CPU, Intel GPU and other GPU can use Float16 Precision Only
-    model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).float().to(DEVICE).eval()
-# 多显卡支持，使用下面两行代替上面一行，将num_gpus改为你实际的显卡数量
-# from utils import load_model_on_gpus
-# model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
-"""Override Chatbot.postprocess"""
-def postprocess(self, y):
-    if y is None:
-        return []
-    for i, (message, response) in enumerate(y):
-        y[i] = (
-            None if message is None else mdtex2html.convert((message)),
-            None if response is None else mdtex2html.convert(response),
-        )
-    return y
-gr.Chatbot.postprocess = postprocess
-def parse_text(text):
-    """copy from https://github.com/GaiZhenbiao/ChuanhuChatGPT/"""
-    lines = text.split("\n")
-    lines = [line for line in lines if line != ""]
-    count = 0
-    for i, line in enumerate(lines):
-        if "```" in line:
-            count += 1
-            items = line.split('`')
-            if count % 2 == 1:
-                lines[i] = f'<pre><code class="language-{items[-1]}">'
-            else:
-                lines[i] = f'<br></code></pre>'
-        else:
-            if i > 0:
-                if count % 2 == 1:
-                    line = line.replace("`", "\`")
-                    line = line.replace("<", "&lt;")
-                    line = line.replace(">", "&gt;")
-                    line = line.replace(" ", "&nbsp;")
-                    line = line.replace("*", "&ast;")
-                    line = line.replace("_", "&lowbar;")
-                    line = line.replace("-", "&#45;")
-                    line = line.replace(".", "&#46;")
-                    line = line.replace("!", "&#33;")
-                    line = line.replace("(", "&#40;")
-                    line = line.replace(")", "&#41;")
-                    line = line.replace("$", "&#36;")
-                lines[i] = "<br>" + line
-    text = "".join(lines)
-    return text
-def predict(input, chatbot, max_length, top_p, temperature, history, past_key_values):
-    chatbot.append((parse_text(input), ""))
-    for response, history, past_key_values in model.stream_chat(tokenizer, input, history,
-                                                                past_key_values=past_key_values,
-                                                                return_past_key_values=True,
-                                                                max_length=max_length, top_p=top_p,
-                                                                temperature=temperature):
-        chatbot[-1] = (parse_text(input), parse_text(response))
-        yield chatbot, history, past_key_values
-def reset_user_input():
-    return gr.update(value='')
-def reset_state():
-    return [], [], None
-with gr.Blocks() as demo:
-    gr.HTML("""<h1 align="center">ChatGLM3-6B</h1>""")
-    chatbot = gr.Chatbot()
-    with gr.Row():
-        with gr.Column(scale=4):
-            with gr.Column(scale=12):
-                user_input = gr.Textbox(show_label=False, placeholder="Input...", lines=10).style(
-                    container=False)
-            with gr.Column(min_width=32, scale=1):
-                submitBtn = gr.Button("Submit", variant="primary")
-        with gr.Column(scale=1):
-            emptyBtn = gr.Button("Clear History")
-            max_length = gr.Slider(0, 32768, value=8192, step=1.0, label="Maximum length", interactive=True)
-            top_p = gr.Slider(0, 1, value=0.8, step=0.01, label="Top P", interactive=True)
-            temperature = gr.Slider(0, 1, value=0.6, step=0.01, label="Temperature", interactive=True)
-    history = gr.State([])
-    past_key_values = gr.State(None)
-    submitBtn.click(predict, [user_input, chatbot, max_length, top_p, temperature, history, past_key_values],
-                    [chatbot, history, past_key_values], show_progress=True)
-    submitBtn.click(reset_user_input, [], [user_input])
-    emptyBtn.click(reset_state, outputs=[chatbot, history, past_key_values], show_progress=True)
-demo.queue().launch(share=False, server_name="127.0.0.1", server_port=8501, inbrowser=True)
--- a/basic_demo/web_demo2.py
+++ b/basic_demo/web_demo2.py
-import os
-import streamlit as st
-import torch
-from transformers import AutoModel, AutoTokenizer
-MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/chatglm3-6b')
-TOKENIZER_PATH = os.environ.get("TOKENIZER_PATH", MODEL_PATH)
-DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
-# 设置页面标题、图标和布局
-st.set_page_config(
-    page_title="ChatGLM3-6B 演示",
-    page_icon=":robot:",
-    layout="wide"
-)
-@st.cache_resource
-def get_model():
-    tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH, trust_remote_code=True)
-    if 'cuda' in DEVICE:  # AMD, NVIDIA GPU can use Half Precision
-        model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).to(DEVICE).eval()
-    else:  # CPU, Intel GPU and other GPU can use Float16 Precision Only
-        model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).float().to(DEVICE).eval()
-    # 多显卡支持,使用下面两行代替上面一行,将num_gpus改为你实际的显卡数量
-    # from utils import load_model_on_gpus
-    # model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
-    return tokenizer, model
-# 加载Chatglm3的model和tokenizer
-tokenizer, model = get_model()
-# 初始化历史记录和past key values
-if "history" not in st.session_state:
-    st.session_state.history = []
-if "past_key_values" not in st.session_state:
-    st.session_state.past_key_values = None
-# 设置max_length、top_p和temperature
-max_length = st.sidebar.slider("max_length", 0, 32768, 8192, step=1)
-top_p = st.sidebar.slider("top_p", 0.0, 1.0, 0.8, step=0.01)
-temperature = st.sidebar.slider("temperature", 0.0, 1.0, 0.6, step=0.01)
-# 清理会话历史
-buttonClean = st.sidebar.button("清理会话历史", key="clean")
-if buttonClean:
-    st.session_state.history = []
-    st.session_state.past_key_values = None
-    if torch.cuda.is_available():
-        torch.cuda.empty_cache()
-    st.rerun()
-# 渲染聊天历史记录
-for i, message in enumerate(st.session_state.history):
-    if message["role"] == "user":
-        with st.chat_message(name="user", avatar="user"):
-            st.markdown(message["content"])
-    else:
-        with st.chat_message(name="assistant", avatar="assistant"):
-            st.markdown(message["content"])
-# 输入框和输出框
-with st.chat_message(name="user", avatar="user"):
-    input_placeholder = st.empty()
-with st.chat_message(name="assistant", avatar="assistant"):
-    message_placeholder = st.empty()
-# 获取用户输入
-prompt_text = st.chat_input("请输入您的问题")
-# 如果用户输入了内容,则生成回复
-if prompt_text:
-    input_placeholder.markdown(prompt_text)
-    history = st.session_state.history
-    past_key_values = st.session_state.past_key_values
-    for response, history, past_key_values in model.stream_chat(
-        tokenizer,
-        prompt_text,
-        history,
-        past_key_values=past_key_values,
-        max_length=max_length,
-        top_p=top_p,
-        temperature=temperature,
-        return_past_key_values=True,
-    ):
-        message_placeholder.markdown(response)
-    # 更新历史记录和past key values
-    st.session_state.history = history
-    st.session_state.past_key_values = past_key_values
--- a/composite_demo/.streamlit/config.toml
+++ b/composite_demo/.streamlit/config.toml
-[theme]
-font = "monospace"
\ No newline at end of file
--- a/composite_demo/README.md
+++ b/composite_demo/README.md
-# ChatGLM3 Web Demo
-![Demo webpage](assets/demo.png)
-## 安装
-我们建议通过 [Conda](https://docs.conda.io/en/latest/) 进行环境管理。
-执行以下命令新建一个 conda 环境并安装所需依赖：
-```bash
-conda create -n chatglm3-demo python=3.10
-conda activate chatglm3-demo
-pip install -r requirements.txt
-```
-请注意，本项目需要 Python 3.10 或更高版本。
-此外，使用 Code Interpreter 还需要安装 Jupyter 内核：
-```bash
-ipython kernel install --name chatglm3-demo --user
-```
-## 运行
-运行以下命令在本地加载模型并启动 demo：
-```bash
-streamlit run main.py
-```
-之后即可从命令行中看到 demo 的地址，点击即可访问。初次访问需要下载并加载模型，可能需要花费一定时间。
-如果已经在本地下载了模型，可以通过 `export MODEL_PATH=/path/to/model` 来指定从本地加载模型。如果需要自定义 Jupyter 内核，可以通过 `export IPYKERNEL=<kernel_name>` 来指定。
-## 使用
-ChatGLM3 Demo 拥有三种模式：
- Chat: 对话模式，在此模式下可以与模型进行对话。
- Tool: 工具模式，模型除了对话外，还可以通过工具进行其他操作。
- Code Interpreter: 代码解释器模式，模型可以在一个 Jupyter 环境中执行代码并获取结果，以完成复杂任务。
-### 对话模式
-对话模式下，用户可以直接在侧边栏修改 top_p, temperature, System Prompt 等参数来调整模型的行为。例如
-![The model responses following system prompt](assets/emojis.png)
-### 工具模式
-可以通过在 `tool_registry.py` 中注册新的工具来增强模型的能力。只需要使用 `@register_tool` 装饰函数即可完成注册。对于工具声明，函数名称即为工具的名称，函数 docstring 即为工具的说明；对于工具的参数，使用 `Annotated[typ: type, description: str, required: bool]` 标注参数的类型、描述和是否必须。
-例如，`get_weather` 工具的注册如下：
-```python
-@register_tool
-def get_weather(
-    city_name: Annotated[str, 'The name of the city to be queried', True],
-) -> str:
-    """
-    Get the weather for `city_name` in the following week
-    """
-    ...
-```
-![The model uses tool to query the weather of pairs.](assets/tool.png)
-此外，你也可以在页面中通过 `Manual mode` 进入手动模式，在这一模式下你可以通过 YAML 来直接指定工具列表，但你需要手动将工具的输出反馈给模型。
-### 代码解释器模式
-由于拥有代码执行环境，此模式下的模型能够执行更为复杂的任务，例如绘制图表、执行符号运算等等。模型会根据对任务完成情况的理解自动地连续执行多个代码块，直到任务完成。因此，在这一模式下，你只需要指明希望模型执行的任务即可。
-例如，我们可以让 ChatGLM3 画一个爱心：
-![The code interpreter draws a heart according to the user's instructions.](assets/heart.png)
-### 额外技巧
- 在模型生成文本时，可以通过页面右上角的 `Stop` 按钮进行打断。
- 刷新页面即可清空对话记录。
-# Enjoy!
\ No newline at end of file
--- a/composite_demo/README_en.md
+++ b/composite_demo/README_en.md
-# ChatGLM3 Web Demo
-![Demo webpage](assets/demo.png)
-## Installation
-We recommend managing environments through [Conda](https://docs.conda.io/en/latest/).
-Execute the following commands to create a new conda environment and install the necessary dependencies:
-```bash
-conda create -n chatglm3-demo python=3.10
-conda activate chatglm3-demo
-pip install -r requirements.txt
-```
-Please note that this project requires Python 3.10 or higher.
-Additionally, installing the Jupyter kernel is required for using the Code Interpreter:
-```bash
-ipython kernel install --name chatglm3-demo --user
-```
-## Execution
-Run the following command to load the model locally and start the demo:
-```bash
-streamlit run main.py
-```
-Afterward, the address of the demo can be seen from the command line; click to access. The first visit requires the download and loading of the model, which may take some time.
-If the model has already been downloaded locally, you can specify to load the model locally through `export MODEL_PATH=/path/to/model`. If you need to customize the Jupyter kernel, you can specify it through `export IPYKERNEL=<kernel_name>`.
-## Usage
-ChatGLM3 Demo has three modes:
- Chat: Dialogue mode, where you can interact with the model.
- Tool: Tool mode, where the model, in addition to dialogue, can perform other operations through tools.
- Code Interpreter: Code interpreter mode, where the model can execute code in a Jupyter environment and obtain results to complete complex tasks.
-### Dialogue Mode
-In dialogue mode, users can directly modify parameters such as top_p, temperature, System Prompt in the sidebar to adjust the behavior of the model. For example,
-![The model responses following system prompt](assets/emojis.png)
-### Tool Mode
-You can enhance the model's capabilities by registering new tools in `tool_registry.py`. Just use the `@register_tool` decorator to complete the registration. For tool declarations, the function name is the name of the tool, and the function docstring is the description of the tool; for tool parameters, use `Annotated[typ: type, description: str, required: bool]` to annotate the type, description, and whether it is necessary of the parameters.
-For example, the registration of the `get_weather` tool is as follows:
-```python
-@register_tool
-def get_weather(
-    city_name: Annotated[str, 'The name of the city to be queried', True],
-) -> str:
-    """
-    Get the weather for `city_name` in the following week
-    """
-    ...
-```
-![The model uses tool to query the weather of pairs.](assets/tool.png)
-Additionally, you can enter the manual mode through `Manual mode` on the page. In this mode, you can directly specify the tool list through YAML, but you need to manually feed back the tool's output to the model.
-### Code Interpreter Mode
-Due to having a code execution environment, the model in this mode can perform more complex tasks, such as drawing charts, performing symbolic operations, etc. The model will automatically execute multiple code blocks in succession based on its understanding of the task completion status until the task is completed. Therefore, in this mode, you only need to specify the task you want the model to perform.
-For example, we can ask ChatGLM3 to draw a heart:
-![The code interpreter draws a heart according to the user's instructions.](assets/heart.png)
-### Additional Tips
- While the model is generating text, it can be interrupted by the `Stop` button at the top right corner of the page.
- Refreshing the page will clear the dialogue history.
-# Enjoy!
\ No newline at end of file
--- a/composite_demo/assets/demo.png
+++ b/composite_demo/assets/demo.png
--- a/composite_demo/assets/emojis.png
+++ b/composite_demo/assets/emojis.png