Commit 67ca83cf authored by Rayyyyy's avatar Rayyyyy
Browse files

Support GLM-4-0414

parent 78ba9d16
...@@ -15,47 +15,45 @@ GLM-4-9B是智谱AI推出的最新一代预训练模型GLM-4系列中的开源 ...@@ -15,47 +15,45 @@ GLM-4-9B是智谱AI推出的最新一代预训练模型GLM-4系列中的开源
</div> </div>
## 环境配置 ## 环境配置
-v 路径docker_nameimageID根据实际情况修改 `-v 路径``docker_name``imageID`根据实际情况修改
### Docker(方法一) ### Docker(方法一)
```bash ```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310 dcoker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10
docker run -it --network=host --privileged=true --name=docker_name --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=32G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro imageID /bin/bash docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
cd /your_code_path/glm4-9b_pytorch cd /your_code_path/glm-4_pytorch
cd basic_demo pip install -r inference/requirements.txt
pip install -r requirements.txt pip install -r finetune/requirements.txt
``` ```
### Dockerfile(方法二) ### Dockerfile(方法二)
```bash ```bash
cd ./docker cd ./docker
docker build --no-cache -t glm4-9b:latest . docker build --no-cache -t glm4-9b:latest .
docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
docker run -it --network=host --privileged=true --name=docker_name --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=32G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro imageID /bin/bash cd /your_code_path/glm-4_pytorch
pip install -r inference/requirements.txt
cd /your_code_path/glm4-9b_pytorch pip install -r finetune/requirements.txt
cd basic_demo
pip install -r requirements.txt
``` ```
### Anaconda(方法三) ### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/ 1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
```bash ```bash
DTK软件栈:dtk24.04 DTK: 25.04
python:python3.10 python: 3.10
torch:2.1 torch: 2.4.1
deepspeed: 0.12.3 deepspeed: 0.14.2+das.opt2.dtk2504
``` ```
**Tips**:以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应 **Tips**:以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应
2、其他非特殊库直接按照下面步骤进行安装 2、其他非特殊库直接按照下面步骤进行安装
```bash ```bash
cd basic_demo pip install -r inference/requirements.txt
pip install deepspeed-0.12.3+das1.0+gita724046.abi0.dtk2404.torch2.1.0-cp310-cp310-manylinux2014_x86_64.whl pip install -r finetune/requirements.txt
pip install -r requirements.txt
``` ```
## 数据集 ## 数据集
...@@ -80,33 +78,93 @@ python gen_messages_data.py --data_path /path/to/AdvertiseGen ...@@ -80,33 +78,93 @@ python gen_messages_data.py --data_path /path/to/AdvertiseGen
- 这里是一个不带有工具的例子: - 这里是一个不带有工具的例子:
``` ```json
{"messages": [{"role": "user", "content": "类型#裤*材质#牛仔布*风格#性感"}, {"role": "assistant", "content": "3x1的这款牛仔裤采用浅白的牛仔面料为裤身材质,其柔然的手感和细腻的质地,在穿着舒适的同时,透露着清纯甜美的个性气质。除此之外,流畅的裤身剪裁将性感的腿部曲线彰显的淋漓尽致,不失为一款随性出街的必备单品。"}]} {
"messages": [
{
"role": "user",
"content": "类型#裤*材质#牛仔布*风格#性感"
},
{
"role": "assistant",
"content": "3x1的这款牛仔裤采用浅白的牛仔面料为裤身材质,其柔然的手感和细腻的质地,在穿着舒适的同时,透露着清纯甜美的个性气质。除此之外,流畅的裤身剪裁将性感的腿部曲线彰显的淋漓尽致,不失为一款随性出街的必备单品。"
}
]
}
``` ```
- 这是一个带有工具调用的例子: - 这是一个带有工具调用的例子:
``` ```json
{"messages": [{"role": "system", "content": "", "tools": [{"type": "function", "function": {"name": "get_recommended_books", "description": "Get recommended books based on user's interests", "parameters": {"type": "object", "properties": {"interests": {"type": "array", "items": {"type": "string"}, "description": "The interests to recommend books for"}}, "required": ["interests"]}}}]}, {"role": "user", "content": "Hi, I am looking for some book recommendations. I am interested in history and science fiction."}, {"role": "assistant", "content": "{\"name\": \"get_recommended_books\", \"arguments\": {\"interests\": [\"history\", \"science fiction\"]}}"}, {"role": "observation", "content": "{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"}, {"role": "assistant", "content": "Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."}]} {
"messages": [
{
"role": "system",
"content": "",
"tools": [
{
"type": "function",
"function": {
"name": "get_recommended_books",
"description": "Get recommended books based on user's interests",
"parameters": {
"type": "object",
"properties": {
"interests": {
"type": "array",
"items": {
"type": "string"
},
"description": "The interests to recommend books for"
}
},
"required": [
"interests"
]
}
}
}
]
},
{
"role": "user",
"content": "Hi, I am looking for some book recommendations. I am interested in history and science fiction."
},
{
"role": "assistant",
"content": "{\"name\": \"get_recommended_books\", \"arguments\": {\"interests\": [\"history\", \"science fiction\"]}}"
},
{
"role": "observation",
"content": "{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"
},
{
"role": "assistant",
"content": "Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."
}
]
}
``` ```
- `system` 角色为可选角色,但若存在 `system` 角色,其必须出现在 `user` 角色之前,且一个完整的对话数据(无论单轮或者多轮对话)只能出现一次 `system` 角色。 - `system` 角色为可选角色,但若存在 `system` 角色,其必须出现在 `user`
- `tools` 字段为可选字段,若存在 `tools` 字段,其必须出现在 `system` 角色之后,且一个完整的对话数据(无论单轮或者多轮对话)只能出现一次 `tools` 字段。当 `tools` 字段存在时,`system` 角色必须存在并且 `content` 字段为空。 角色之前,且一个完整的对话数据(无论单轮或者多轮对话)只能出现一次 `system` 角色。
- `tools` 字段为可选字段,若存在 `tools` 字段,其必须出现在 `system`
角色之后,且一个完整的对话数据(无论单轮或者多轮对话)只能出现一次 `tools` 字段。当 `tools` 字段存在时,`system`
角色必须存在并且 `content` 字段为空。
## 训练 ## 训练
通过[预训练权重](#预训练权重)下载预训练模型,当前用例使用[GLM-4-9B-chat](https://huggingface.co/THUDM/glm-4-9b-chat)[GLM-4-9B-0414](https://huggingface.co/THUDM/GLM-4-9B-0414)模型。 通过[预训练权重](#预训练权重)下载预训练模型,当前用例使用[GLM-4-9B-chat](https://huggingface.co/THUDM/glm-4-9b-chat)[GLM-4-9B-0414](https://huggingface.co/THUDM/GLM-4-9B-0414)模型。
### GLM-4-9B-chat 原生训练方法 ### 原生训练方法
1. 进入`finetune_demo`目录下,首先安装所需环境信息 1. 进入`finetune`目录下
```bash ```bash
cd finetune_demo cd finetune
pip install -r requirements.txt
``` ```
2. 配置文件位于[configs](./finetune_demo/configs/)目录下,包括以下文件: 2. 配置文件位于[configs](./finetune/configs/)目录下,包括以下文件:
- `deepspeed配置文件`[ds_zereo_2](./finetune_demo/configs/ds_zereo_2.json)[ds_zereo_3](./finetune_demo/configs/ds_zereo_3.json) - `deepspeed配置文件`[ds_zereo_2](./finetune/configs/ds_zereo_2.json)[ds_zereo_3](./finetune/configs/ds_zereo_3.json)
- `lora.yaml/ ptuning_v2.yaml / sft.yaml`: 模型不同方式的配置文件,包括模型参数、优化器参数、训练参数等。部分重要参数解释如下: - `lora.yaml/ sft.yaml`: 模型不同方式的配置文件,包括模型参数、优化器参数、训练参数等。部分重要参数解释如下:
+ data_config 部分 + data_config 部分
+ train_file: 训练数据集的文件路径。 + train_file: 训练数据集的文件路径。
+ val_file: 验证数据集的文件路径。 + val_file: 验证数据集的文件路径。
...@@ -143,20 +201,17 @@ pip install -r requirements.txt ...@@ -143,20 +201,17 @@ pip install -r requirements.txt
+ num_attention_heads: 2: P-TuningV2 的注意力头数(不要改动)。 + num_attention_heads: 2: P-TuningV2 的注意力头数(不要改动)。
+ token_dim: 256: P-TuningV2 的 token 维度(不要改动)。 + token_dim: 256: P-TuningV2 的 token 维度(不要改动)。
3. 脚本中主要参数解释, 以下参数均可根据自身数据地址进行替换:
+ `data/AdvertiseGen/saves/`: `.jsonl`数据地址
+ `../checkpoints/glm-4-9b-chat/`: 模型地址
+ `configs/lora.yaml`: 配置文件地址
#### 单机单卡 #### 单机单卡
```shell ```shell
bash train.sh # For Chat Fine-tune
python finetune.py data/AdvertiseGen/ THUDM/GLM-4-9B-0414 configs/lora.yaml
``` ```
#### 单机多卡/多机多卡 #### 单机多卡/多机多卡
这里使用`deepspeed`作为加速方案,请确认当前环境已经根据[环境配置章节](#环境配置)安装好了`deepspeed`库。 这里使用`deepspeed`作为加速方案,请确认当前环境已经根据[环境配置章节](#环境配置)安装好了`deepspeed`库。
```shell ```shell
bash train_dp.sh # For Chat Fine-tune
OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=8 finetune.py data/AdvertiseGen/ THUDM/GLM-4-9B-0414 configs/lora.yaml # For Chat Fine-tune
``` ```
#### 从保存点进行微调 #### 从保存点进行微调
...@@ -164,12 +219,12 @@ bash train_dp.sh ...@@ -164,12 +219,12 @@ bash train_dp.sh
1. `yes`, 自动从**最后一个保存的Checkpoint**开始训练,例如: 1. `yes`, 自动从**最后一个保存的Checkpoint**开始训练,例如:
```shell ```shell
python finetune.py ../data/AdvertiseGen/saves/ ../checkpoints/glm-4-9b-chat/ configs/lora.yaml yes python finetune.py ../data/AdvertiseGen/saves/ THUDM/GLM-4-9B-0414 configs/lora.yaml yes
``` ```
2. `XX`, 断点号数字,例`600`则从序号**600 Checkpoint**开始训练,例如: 2. `XX`, 断点号数字,例`600`则从序号**600 Checkpoint**开始训练,例如:
```shell ```shell
python finetune.py ../data/AdvertiseGen/saves/ ../checkpoints/glm-4-9b-chat/ configs/lora.yaml 600 python finetune.py ../data/AdvertiseGen/saves/ THUDM/GLM-4-9B-0414 configs/lora.yaml 600
``` ```
### Llama Factory 微调方法(推荐) ### Llama Factory 微调方法(推荐)
...@@ -194,41 +249,26 @@ SFT训练脚本示例,参考`llama-factory/train_lora`下对应yaml文件。 ...@@ -194,41 +249,26 @@ SFT训练脚本示例,参考`llama-factory/train_lora`下对应yaml文件。
参数解释同[#全参微调](#全参微调) 参数解释同[#全参微调](#全参微调)
## 推理 ## 推理
### GLM-4-9B-Chat/GLM-4V-9B 模型推理脚本 ```shell
**参数解释** cd inference
- `--model_name_or_path`:待测模型名或模型地址,当前默认"THUDM/glm-4-9b-chat"
- `--device`: 当前默认"cuda"
- `--query`: 待测输入语句,当前默认"你好"
```
pip install -U huggingface_hub hf_transfer
export HF_ENDPOINT=https://hf-mirror.com/
cd basic_demo
python quick_start.py
``` ```
### 使用 transformers 后端代码
#### 使用命令行与 GLM-4-9B 模型进行对话 #### 使用命令行与 GLM-4-9B 模型进行对话
``` ```shell
# chat # 修改代码中的MODEL_PATH为测试模型地址
python trans_cli_demo.py --model_name_or_path ../checkpoints/glm-4-9b-chat # 当前默认GLM-4-9B-0414模型
# 多模态 python trans_cli_demo.py
python trans_cli_vision_demo.py --model_name_or_path ../checkpoints/glm-4v-9b
``` ```
#### 使用 Gradio 网页端与 GLM-4-9B-Chat 模型进行对话 #### 使用 Gradio 网页端与 GLM-4-9B-Chat 模型进行对话
``` ```shell
python trans_web_demo.py --model_name_or_path ../checkpoints/glm-4-9b-chat # 修改代码中的MODEL_PATH为测试模型地址
``` # 当前默认GLM-4-9B-0414模型
python trans_web_demo.py
#### 验证微调后的模型
您可以在[finetune_demo/inference.py](./finetune_demo/inference.py) 中使用微调后的模型,执行方式如下。
```
python inference.py your_finetune_path
``` ```
### GLM-4-9B-0414/GLM-4-32B-0414/GLM-4-32B-Base-0414 模型推理脚本 ### GLM-4-9B-0414/GLM-4-32B-0414/GLM-4-32B-Base-0414 模型推理脚本
``` ```shell
python infer_glm4.py --model_path /path/of/model/ --message "你好" python infer_glm4.py --model_path /path/of/model/ --message "你好"
``` ```
......
'''based on transformers'''
import torch
import argparse
from transformers import AutoModelForCausalLM, AutoTokenizer
parse = argparse.ArgumentParser()
parse.add_argument('--model_name_or_path', default="THUDM/glm-4-9b-chat")
parse.add_argument('--device', default="cuda")
parse.add_argument('--query', type=str, default="你好")
args = parse.parse_args()
device = args.device
model_name_or_path = args.model_name_or_path
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
query = args.query
inputs = tokenizer.apply_chat_template([{"role": "user", "content": query}],
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
return_dict=True
)
inputs = inputs.to(device)
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to(device).eval()
gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print('Result', tokenizer.decode(outputs[0], skip_special_tokens=True))
transformers==4.40.0
huggingface-hub>=0.23.1
sentencepiece>=0.2.0
pydantic>=2.7.1
timm>=0.9.16
tiktoken>=0.7.0
accelerate>=0.30.1
sentence_transformers>=2.7.0
# web demo
gradio>=4.31.5
"""
This script creates an interactive web demo for the GLM-4-9B model using Gradio,
a Python library for building quick and easy UI components for machine learning models.
It's designed to showcase the capabilities of the GLM-4-9B model in a user-friendly interface,
allowing users to interact with the model through a chat-like interface.
"""
import os
import argparse
import torch
import gradio as gr
from threading import Thread
from typing import Union
from pathlib import Path
from peft import AutoPeftModelForCausalLM, PeftModelForCausalLM
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
PreTrainedModel,
PreTrainedTokenizer,
PreTrainedTokenizerFast,
StoppingCriteria,
StoppingCriteriaList,
TextIteratorStreamer
)
ModelType = Union[PreTrainedModel, PeftModelForCausalLM]
TokenizerType = Union[PreTrainedTokenizer, PreTrainedTokenizerFast]
# add model path
parser = argparse.ArgumentParser()
parser.add_argument('--model_name_or_path', default='THUDM/glm-4-9b-chat')
args = parser.parse_args()
# MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/glm-4-9b-chat')
MODEL_PATH = args.model_name_or_path
TOKENIZER_PATH = os.environ.get("TOKENIZER_PATH", MODEL_PATH)
def _resolve_path(path: Union[str, Path]) -> Path:
return Path(path).expanduser().resolve()
def load_model_and_tokenizer(
model_dir: Union[str, Path], trust_remote_code: bool = True
) -> tuple[ModelType, TokenizerType]:
model_dir = _resolve_path(model_dir)
if (model_dir / 'adapter_config.json').exists():
model = AutoPeftModelForCausalLM.from_pretrained(
model_dir, trust_remote_code=trust_remote_code, device_map='auto'
)
tokenizer_dir = model.peft_config['default'].base_model_name_or_path
else:
model = AutoModelForCausalLM.from_pretrained(
model_dir, trust_remote_code=trust_remote_code, device_map='auto'
)
tokenizer_dir = model_dir
tokenizer = AutoTokenizer.from_pretrained(
tokenizer_dir, trust_remote_code=trust_remote_code, use_fast=False
)
return model, tokenizer
model, tokenizer = load_model_and_tokenizer(MODEL_PATH, trust_remote_code=True)
class StopOnTokens(StoppingCriteria):
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
stop_ids = model.config.eos_token_id
for stop_id in stop_ids:
if input_ids[0][-1] == stop_id:
return True
return False
def parse_text(text):
lines = text.split("\n")
lines = [line for line in lines if line != ""]
count = 0
for i, line in enumerate(lines):
if "```" in line:
count += 1
items = line.split('`')
if count % 2 == 1:
lines[i] = f'<pre><code class="language-{items[-1]}">'
else:
lines[i] = f'<br></code></pre>'
else:
if i > 0:
if count % 2 == 1:
line = line.replace("`", "\`")
line = line.replace("<", "&lt;")
line = line.replace(">", "&gt;")
line = line.replace(" ", "&nbsp;")
line = line.replace("*", "&ast;")
line = line.replace("_", "&lowbar;")
line = line.replace("-", "&#45;")
line = line.replace(".", "&#46;")
line = line.replace("!", "&#33;")
line = line.replace("(", "&#40;")
line = line.replace(")", "&#41;")
line = line.replace("$", "&#36;")
lines[i] = "<br>" + line
text = "".join(lines)
return text
def predict(history, max_length, top_p, temperature):
stop = StopOnTokens()
messages = []
for idx, (user_msg, model_msg) in enumerate(history):
if idx == len(history) - 1 and not model_msg:
messages.append({"role": "user", "content": user_msg})
break
if user_msg:
messages.append({"role": "user", "content": user_msg})
if model_msg:
messages.append({"role": "assistant", "content": model_msg})
model_inputs = tokenizer.apply_chat_template(messages,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt").to(next(model.parameters()).device)
streamer = TextIteratorStreamer(tokenizer, timeout=60, skip_prompt=True, skip_special_tokens=True)
generate_kwargs = {
"input_ids": model_inputs,
"streamer": streamer,
"max_new_tokens": max_length,
"do_sample": True,
"top_p": top_p,
"temperature": temperature,
"stopping_criteria": StoppingCriteriaList([stop]),
"repetition_penalty": 1.2,
"eos_token_id": model.config.eos_token_id,
}
t = Thread(target=model.generate, kwargs=generate_kwargs)
t.start()
for new_token in streamer:
if new_token:
history[-1][1] += new_token
yield history
with gr.Blocks() as demo:
gr.HTML("""<h1 align="center">GLM-4-9B Gradio Simple Chat Demo</h1>""")
chatbot = gr.Chatbot()
with gr.Row():
with gr.Column(scale=4):
with gr.Column(scale=12):
user_input = gr.Textbox(show_label=False, placeholder="Input...", lines=10, container=False)
with gr.Column(min_width=32, scale=1):
submitBtn = gr.Button("Submit")
with gr.Column(scale=1):
emptyBtn = gr.Button("Clear History")
max_length = gr.Slider(0, 32768, value=8192, step=1.0, label="Maximum length", interactive=True)
top_p = gr.Slider(0, 1, value=0.8, step=0.01, label="Top P", interactive=True)
temperature = gr.Slider(0.01, 1, value=0.6, step=0.01, label="Temperature", interactive=True)
def user(query, history):
return "", history + [[parse_text(query), ""]]
submitBtn.click(user, [user_input, chatbot], [user_input, chatbot], queue=False).then(
predict, [chatbot, max_length, top_p, temperature], chatbot
)
emptyBtn.click(lambda: None, None, chatbot, queue=False)
demo.queue()
demo.launch(server_name="127.0.0.1", server_port=8000, inbrowser=True, share=True)
accelerate
huggingface_hub>=0.19.4
ipykernel>=6.26.0
ipython>=8.18.1
jupyter_client>=8.6.0
langchain
langchain-community
matplotlib
pillow>=10.1.0
pymupdf
python-docx
python-pptx
pyyaml>=6.0.1
requests>=2.31.0
sentencepiece
streamlit>=1.35.0
tiktoken
transformers==4.40.0
zhipuai>=2.1.0
# Please install vllm if you'd like to use long context model.
# vllm
BROWSER_SERVER_URL = 'http://localhost:3000'
IPYKERNEL = 'glm-4-demo'
ZHIPU_AI_KEY = ''
COGVIEW_MODEL = 'cogview-3'
...@@ -38,31 +38,39 @@ pnpm install ...@@ -38,31 +38,39 @@ pnpm install
1. 修改 `browser/src/config.ts` 中的 `BING_SEARCH_API_KEY` 配置浏览器服务需要使用的 Bing 搜索 API Key: 1. 修改 `browser/src/config.ts` 中的 `BING_SEARCH_API_KEY` 配置浏览器服务需要使用的 Bing 搜索 API Key:
```diff ```diff
--- a/browser/src/config.ts export default {
+++ b/browser/src/config.ts
@@ -3,7 +3,7 @@ export default {
BROWSER_TIMEOUT: 10000, BROWSER_TIMEOUT: 10000,
BING_SEARCH_API_URL: 'https://api.bing.microsoft.com/v7.0', BING_SEARCH_API_URL: 'https://api.bing.microsoft.com/v7.0',
- BING_SEARCH_API_KEY: '', BING_SEARCH_API_KEY: '<PUT_YOUR_BING_SEARCH_KEY_HERE>',
+ BING_SEARCH_API_KEY: '<PUT_YOUR_BING_SEARCH_KEY_HERE>',
HOST: 'localhost', HOST: 'localhost',
PORT: 3000, PORT: 3000,
};
```
如果您注册的是Bing Customer Search的API,您可以修改您的配置文件为如下,并且填写您的Custom Configuration ID:
```diff
export default {
LOG_LEVEL: 'debug',
BROWSER_TIMEOUT: 10000,
BING_SEARCH_API_URL: 'https://api.bing.microsoft.com/v7.0/custom/',
BING_SEARCH_API_KEY: 'YOUR_BING_SEARCH_API_KEY',
CUSTOM_CONFIG_ID : 'YOUR_CUSTOM_CONFIG_ID', //将您的Custom Configuration ID放在此处
HOST: 'localhost',
PORT: 3000,
};
``` ```
2. 文生图功能需要调用 CogView API。修改 `src/tools/config.py` 2. 文生图功能需要调用 CogView API。修改 `src/tools/config.py`
,提供文生图功能需要使用的 [智谱 AI 开放平台](https://open.bigmodel.cn) API Key: ,提供文生图功能需要使用的 [智谱 AI 开放平台](https://open.bigmodel.cn) API Key:
```diff ```diff
--- a/src/tools/config.py BROWSER_SERVER_URL = 'http://localhost:3000'
+++ b/src/tools/config.py
@@ -2,5 +2,5 @@ BROWSER_SERVER_URL = 'http://localhost:3000'
IPYKERNEL = 'glm-4-demo' IPYKERNEL = 'glm-4-demo'
-ZHIPU_AI_KEY = '' ZHIPU_AI_KEY = '<PUT_YOUR_ZHIPU_AI_KEY_HERE>'
+ZHIPU_AI_KEY = '<PUT_YOUR_ZHIPU_AI_KEY_HERE>'
COGVIEW_MODEL = 'cogview-3' COGVIEW_MODEL = 'cogview-3'
``` ```
...@@ -82,11 +90,13 @@ pnpm install ...@@ -82,11 +90,13 @@ pnpm install
之后即可从命令行中看到 demo 的地址,点击即可访问。初次访问需要下载并加载模型,可能需要花费一定时间。 之后即可从命令行中看到 demo 的地址,点击即可访问。初次访问需要下载并加载模型,可能需要花费一定时间。
如果已经在本地下载了模型,可以通过 `export *_MODEL_PATH=/path/to/model` 来指定从本地加载模型。可以指定的模型包括: 如果已经在本地下载了模型,可以通过 `export *_MODEL_PATH=/path/to/model` 来指定从本地加载模型。可以指定的模型包括:
- `CHAT_MODEL_PATH`: 用于 All Tools 模式与文档解读模式,默认为 `THUDM/glm-4-9b-chat` - `CHAT_MODEL_PATH`: 用于 All Tools 模式与文档解读模式,默认为 `THUDM/glm-4-9b-chatglm-4-9b-chat`
- `VLM_MODEL_PATH`: 用于 VLM 模式,默认为 `THUDM/glm-4v-9b` - `VLM_MODEL_PATH`: 用于 VLM 模式,默认为 `THUDM/glm-4v-9b`
Chat 模型支持使用 [vLLM](https://github.com/vllm-project/vllm) 推理。若要使用,请安装 vLLM 并设置环境变量 `USE_VLLM=1` Chat 模型支持使用 [vLLM](https://github.com/vllm-project/vllm) 推理。若要使用,请安装 vLLM 并设置环境变量 `USE_VLLM=1`
Chat 模型支持使用 [OpenAI API](https://platform.openai.com/docs/api-reference/introduction) 推理。若要使用,请启动basic_demo目录下的openai_api_server并设置环境变量 `USE_API=1`。该功能可以解耦推理服务器和demo服务器。
如果需要自定义 Jupyter 内核,可以通过 `export IPYKERNEL=<kernel_name>` 来指定。 如果需要自定义 Jupyter 内核,可以通过 `export IPYKERNEL=<kernel_name>` 来指定。
## 使用 ## 使用
......
...@@ -42,31 +42,26 @@ pnpm install ...@@ -42,31 +42,26 @@ pnpm install
needs to use: needs to use:
```diff ```diff
--- a/browser/src/config.ts export default {
+++ b/browser/src/config.ts
@@ -3,7 +3,7 @@ export default {
BROWSER_TIMEOUT: 10000, BROWSER_TIMEOUT: 10000,
BING_SEARCH_API_URL: 'https://api.bing.microsoft.com/v7.0', BING_SEARCH_API_URL: 'https://api.bing.microsoft.com/v7.0',
- BING_SEARCH_API_KEY: '', BING_SEARCH_API_KEY: '<PUT_YOUR_BING_SEARCH_KEY_HERE>',
+ BING_SEARCH_API_KEY: '<PUT_YOUR_BING_SEARCH_KEY_HERE>',
HOST: 'localhost', HOST: 'localhost',
PORT: 3000, PORT: 3000,
};
``` ```
2. The Wenshengtu function needs to call the CogView API. Modify `src/tools/config.py` 2. The Wenshengtu function needs to call the CogView API. Modify `src/tools/config.py`
, provide the [Zhipu AI Open Platform](https://open.bigmodel.cn) API Key required for the Wenshengtu function: , provide the [Zhipu AI Open Platform](https://open.bigmodel.cn) API Key required for the Wenshengtu function:
```diff ```diff
--- a/src/tools/config.py BROWSER_SERVER_URL = 'http://localhost:3000'
+++ b/src/tools/config.py
@@ -2,5 +2,5 @@ BROWSER_SERVER_URL = 'http://localhost:3000'
IPYKERNEL = 'glm4-demo' IPYKERNEL = 'glm4-demo'
-ZHIPU_AI_KEY = '' ZHIPU_AI_KEY = '<PUT_YOUR_ZHIPU_AI_KEY_HERE>'
+ZHIPU_AI_KEY = '<PUT_YOUR_ZHIPU_AI_KEY_HERE>'
COGVIEW_MODEL = 'cogview-3' COGVIEW_MODEL = 'cogview-3'
``` ```
...@@ -96,6 +91,8 @@ by `export *_MODEL_PATH=/path/to/model`. The models that can be specified includ ...@@ -96,6 +91,8 @@ by `export *_MODEL_PATH=/path/to/model`. The models that can be specified includ
The Chat model supports reasoning using [vLLM](https://github.com/vllm-project/vllm). To use it, please install vLLM and The Chat model supports reasoning using [vLLM](https://github.com/vllm-project/vllm). To use it, please install vLLM and
set the environment variable `USE_VLLM=1`. set the environment variable `USE_VLLM=1`.
The Chat model also supports reasoning using [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). To use it, please run `openai_api_server.py` in `inference` and set the environment variable `USE_API=1`. This function is used to deploy inference server and demo server in different machine.
If you need to customize the Jupyter kernel, you can specify it by `export IPYKERNEL=<kernel_name>`. If you need to customize the Jupyter kernel, you can specify it by `export IPYKERNEL=<kernel_name>`.
## Usage ## Usage
...@@ -141,7 +138,7 @@ Users can upload documents and use the long text capability of GLM-4-9B to under ...@@ -141,7 +138,7 @@ Users can upload documents and use the long text capability of GLM-4-9B to under
pdf and other files. pdf and other files.
+ Tool calls and system prompt words are not supported in this mode. + Tool calls and system prompt words are not supported in this mode.
+ If the text is very long, the model may require a high amount of video memory. Please confirm your hardware + If the text is very long, the model may require a high amount of GPU memory. Please confirm your hardware
configuration. configuration.
## Image Understanding Mode ## Image Understanding Mode
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment