Update GLM-4-0414

78ba9d16 · Rayyyyy · 7fa8c0b3 · 78ba9d16 · 78ba9d16 · 78ba9d16
Commit 78ba9d16 authored Apr 17, 2025 by Rayyyyy
11 changed files
--- a/README.md
+++ b/README.md
@@ -94,10 +94,13 @@ python gen_messages_data.py --data_path /path/to/AdvertiseGen
 - `tools` 字段为可选字段，若存在 `tools` 字段，其必须出现在 `system` 角色之后，且一个完整的对话数据（无论单轮或者多轮对话）只能出现一次 `tools` 字段。当 `tools` 字段存在时，`system` 角色必须存在并且 `content` 字段为空。

 ## 训练
-[glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)。

+通过[预训练权重](#预训练权重)下载预训练模型，当前用例使用[GLM-4-9B-chat](https://huggingface.co/THUDM/glm-4-9b-chat)、[GLM-4-9B-0414](https://huggingface.co/THUDM/GLM-4-9B-0414)模型。
+
+### GLM-4-9B-chat 原生训练方法
 1. 进入`finetune_demo`目录下，首先安装所需环境信息：
 ```bash
+cd finetune_demo
 pip install -r requirements.txt
 ```

@@ -145,18 +148,18 @@ pip install -r requirements.txt
    + `../checkpoints/glm-4-9b-chat/`: 模型地址
    + `configs/lora.yaml`: 配置文件地址

-### 单机单卡
+#### 单机单卡
 ```shell
 bash train.sh
 ```

-### 单机多卡/多机多卡
+#### 单机多卡/多机多卡
 这里使用`deepspeed`作为加速方案，请确认当前环境已经根据[环境配置章节](#环境配置)安装好了`deepspeed`库。
 ```shell
 bash train_dp.sh
 ```

-### 从保存点进行微调
+#### 从保存点进行微调
 如果按照上述方式进行训练，每次微调都会从头开始，如果你想从训练一半的模型开始微调，你可以加入第四个参数，这个参数有两种传入方式:

 1. `yes`, 自动从**最后一个保存的Checkpoint**开始训练，例如:
@@ -169,49 +172,80 @@ python finetune.py ../data/AdvertiseGen/saves/ ../checkpoints/glm-4-9b-chat/ con
 python finetune.py ../data/AdvertiseGen/saves/ ../checkpoints/glm-4-9b-chat/ configs/lora.yaml 600
 ```

-## 推理
-进入[basic_demo](./basic_demo/)目录下
+### Llama Factory 微调方法(推荐)
+训练库安装（**非glm-4_pytorch目录下**），安装版本**大于 v0.9.2**，`Llama-Factory`具体安装方法请参考仓库的README。
+```
+git clone https://developer.sourcefind.cn/codes/OpenDAS/llama-factory
+```
+
+#### 全参微调
+SFT训练脚本示例，参考`llama-factory/train_full`下对应yaml文件。
+
+**参数修改**：
+- **--model_name_or_path**: 修改为待训练模型地址，如 `/data/GLM-4-9B-0414`
+- **--dataset**: 微调训练集名称，可选数据集请参考 `llama-factory/data/dataset_info.json`
+- **--template**: 将 default 修改为 `glm4`
+- **--output_dir**: 模型保存地址

-### 快速调用
+其他参数如：`--learning_rate`、`--save_steps`可根据自身硬件及需求进行修改。
+
+#### lora微调
+SFT训练脚本示例，参考`llama-factory/train_lora`下对应yaml文件。
+参数解释同[#全参微调](#全参微调)
+
+## 推理
+### GLM-4-9B-Chat/GLM-4V-9B 模型推理脚本
 **参数解释**
- --model_name_or_path:待测模型名或模型地址，当前默认"THUDM/glm-4-9b-chat"
- --device: 当前默认"cuda"
- --query: 待测输入语句，当前默认"你好"
+- `--model_name_or_path`:待测模型名或模型地址，当前默认"THUDM/glm-4-9b-chat"
+- `--device`: 当前默认"cuda"
+- `--query`: 待测输入语句，当前默认"你好"

-```bash
+```
 pip install -U huggingface_hub hf_transfer
 export HF_ENDPOINT=https://hf-mirror.com/
-
+cd basic_demo
 python quick_start.py
 ```

-### 使用命令行与 GLM-4-9B 模型进行对话
-```bash
+#### 使用命令行与 GLM-4-9B 模型进行对话
+```
 # chat
 python trans_cli_demo.py --model_name_or_path ../checkpoints/glm-4-9b-chat
 # 多模态
 python trans_cli_vision_demo.py --model_name_or_path ../checkpoints/glm-4v-9b
 ```

-### 使用 Gradio 网页端与 GLM-4-9B-Chat 模型进行对话
+#### 使用 Gradio 网页端与 GLM-4-9B-Chat 模型进行对话
 ```
 python trans_web_demo.py --model_name_or_path ../checkpoints/glm-4-9b-chat
 ```

-### 验证微调后的模型
+#### 验证微调后的模型
 您可以在[finetune_demo/inference.py](./finetune_demo/inference.py) 中使用微调后的模型，执行方式如下。

-```shell
+```
 python inference.py your_finetune_path
 ```

+### GLM-4-9B-0414/GLM-4-32B-0414/GLM-4-32B-Base-0414 模型推理脚本
+```
+python infer_glm4.py --model_path /path/of/model/ --message "你好"
+```
+
 ## result
+- GLM-4-9B-Chat 推理结果
+<div align=center>
+    <img src="./doc/glm4_9b_result.png" width=1500 heigh=400/>
+</div>
+
+- GLM-4-9B-0414 推理结果
 <div align=center>
-    <img src="./doc/result.png" width=1500 heigh=400/>
+    <img src="./doc/glm4_9b_0414_result.png" width=1500 heigh=400/>
 </div>

 ### 精度
 数据集：AdvertiseGen
+模型：GLM-4-9B-Chat

 | device | iters | train_loss |
 | :------: | :------: | :------: |
@@ -225,8 +259,15 @@ python inference.py your_finetune_path
 ### 热点应用行业
 家居,教育,科研

+## 预训练权重
+- [GLM-4-9B](https://huggingface.co/THUDM/glm-4-9b)
+- [GLM-4-9B-chat](https://huggingface.co/THUDM/glm-4-9b-chat)
+- [GLM-4-9B-0414](https://huggingface.co/THUDM/GLM-4-9B-0414)
+- [GLM-4-32B-0414](https://huggingface.co/THUDM/GLM-4-32B-0414)
+- [GLM-4-32B-Base-0414](https://huggingface.co/THUDM/GLM-4-32B-Base-0414)
+
 ## 源码仓库及问题反馈
- https://developer.hpccube.com/codes/modelzoo/glm4-9b_pytorch
+- https://developer.sourcefind.cn/codes/modelzoo/glm-4_pytorch

 ## 参考资料
 - https://github.com/THUDM/GLM-4
--- a/README_20240605.md
+++ b/README_20240605.md
--- a/README_en.md
+++ b/README_en.md
--- a/README_zh.md
+++ b/README_zh.md
--- a/README_zh_240605.md
+++ b/README_zh_240605.md
--- a/basic_demo/infer_glm4.py
+++ b/basic_demo/infer_glm4.py
+import json
+import re
+import ast
+import argparse
+
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--model_path', type=str, default="THUDM/GLM-4-9B-0414", help='模型路径.')
+    parser.add_argument('--message', default="北京和上海今天的天气情况", help='提问的问题.')
+    args = parser.parse_args()
+    return args
+
+def is_function_call(single_message):
+    """Determine whether the current system message is a function call."""
+    pattern = re.compile(r'([^\n`]*?)\n({.*?})(?=\w*\n|$)', re.DOTALL)
+    matches = pattern.findall(single_message)
+    if not matches:
+        return False
+
+    func_name, args_str = matches[0]
+    func_name = func_name.strip()
+    try:
+        parsed_args = json.loads(args_str)
+    except json.JSONDecodeError:
+        try:
+            parsed_args = ast.literal_eval(args_str)
+        except:
+            return False
+
+    return {"name": func_name, "arguments": parsed_args}
+
+def realtime_aqi(city):
+    """Weather Query Tool"""
+    if '北京' in city.lower():
+        return json.dumps({'city': '北京', 'aqi': '10', 'unit': 'celsius'}, ensure_ascii=False)
+    elif '上海' in city.lower():
+        return json.dumps({'city': '上海', 'aqi': '72', 'unit': 'fahrenheit'}, ensure_ascii=False)
+    else:
+        return json.dumps({'city': city, 'aqi': 'unknown'}, ensure_ascii=False)
+
+def build_system_prompt(tools):
+    """Construct system prompt based on the list of available tools."""
+    if tools is None:
+        tools = []
+    value = "# 可用工具"
+    contents = []
+    for tool in tools:
+        content = f"\n\n## {tool['function']['name']}\n\n{json.dumps(tool['function'], ensure_ascii=False, indent=4)}"
+        content += "\n在调用上述函数时，请使用 Json 格式表示调用的参数。"
+        contents.append(content)
+    value += "".join(contents)
+    return value
+
+
+
+if __name__ == "__main__":
+    args = parse_args()
+
+    tokenizer = AutoTokenizer.from_pretrained(args.model_path)
+    model = AutoModelForCausalLM.from_pretrained(args.model_path, device_map="auto")
+
+    tools = [
+    {
+        "type": "function",
+        "function": {
+        "name": "realtime_aqi",
+        "description": "天气预报。获取实时空气质量。当前空气质量，PM2.5，PM10信息",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "city": {
+                    "description": "城市名"
+                }
+            },
+            "required": [
+                "city"
+            ]
+        }
+        }
+    }
+    ]
+
+    system_prompt = build_system_prompt(tools)
+
+    message = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": args.message}
+    ]
+    print(f"User Message: {message[-1]['content']}")
+
+    while True:
+        inputs = tokenizer.apply_chat_template(
+            message,
+            return_tensors="pt",
+            add_generation_prompt=True,
+            return_dict=True,
+        ).to(model.device)
+
+        generate_kwargs = {
+            "input_ids": inputs["input_ids"],
+            "attention_mask": inputs["attention_mask"],
+            "max_new_tokens": 1024,
+            "do_sample": True,
+        }
+        out = model.generate(**generate_kwargs)
+        generate_resp = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:-1], skip_special_tokens=False)
+        stop_sequence = tokenizer.decode(out[0][-1:], skip_speical_tokens=False)
+        if stop_sequence == "<|user|>":
+            print(f"Assistant Response: {generate_resp.strip()}")
+            break
+
+        function_calls = []
+        for m in generate_resp.split("<|assistant|>"):
+            fc_decode = is_function_call(m.strip())
+            if fc_decode:
+                message.append({"role": "assistant", "metadata": fc_decode['name'], "content": json.dumps(fc_decode['arguments'], ensure_ascii=False)})
+                print(f"Function Call: {fc_decode}")
+                function_calls.append(fc_decode)
+            else:
+                message.append({"role": "assistant", "content": m})
+                print(f"Assistant Response: {m.strip()}")
+
+        for fc in function_calls:
+            function_response = realtime_aqi(
+                city=fc["arguments"]["city"],
+            )
+            print(f"Function Response: {function_response}")
+            message.append({"role": "observation", "content": function_response})
--- a/doc/glm4_9b_0414_result.png
+++ b/doc/glm4_9b_0414_result.png
--- a/doc/result.png
+++ b/doc/result.png
--- a/llama-factory/train_full/glm_4_full_sft.yaml
+++ b/llama-factory/train_full/glm_4_full_sft.yaml
+### model
+model_name_or_path: THUDM/GLM-4-9B-0414
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: glm4
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/glm-4-9b/full/sft
+logging_steps: 1
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 2
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: alpaca_en_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
--- a/llama-factory/train_lora/glm_4_lora_sft_ds3.yaml
+++ b/llama-factory/train_lora/glm_4_lora_sft_ds3.yaml
+### model
+model_name_or_path: THUDM/GLM-4-9B-0414
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: glm4
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/glm-4-9b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 2
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: alpaca_en_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
--- a/model.properties
+++ b/model.properties
 # 模型唯一标识
 modelCode=684
 # 模型名称
-modelName=GLM-4-9B_pytorch
+modelName=GLM-4_pytorch
 # 模型描述
-modelDescription=GLM-4-9B是智谱AI推出的最新一代预训练模型GLM-4系列中的开源版本，在语义、数学、推理、代码和知识等多方面的数据集测评中，GLM-4-9B及其人类偏好对齐的版本GLM-4-9B-Chat均表现出超越Llama-3-8B的卓越性能。
+modelDescription=GLM-4系列是智谱AI推出的最新一代预训练模型的开源版本，在语义、数学、推理、代码和知识等多方面的数据集测评中，GLM-4-9B及其人类偏好对齐的版本GLM-4-9B-Chat均表现出超越Llama-3-8B的卓越性能。GLM-4-32B-0414 系列，320 亿参数，效果比肩 OpenAI 的 GPT 系列和 DeepSeek 的 V3/R1 系列，且支持非常友好的本地部署特性。
 # 应用场景
 appScenario=推理,训练,多轮对话,家居,教育,科研
 # 框架类型