# Using Intel® Extension for Transformers to Inference the GLM-4-9B-Chat Model
This example introduces how to use Intel® Extension for Transformers to inference the GLM-4-9B-Chat model.
## Device and Dependency Check
### Relevant Inference Test Data
**The data in this document is tested on the following hardware environment. The actual running environment requirements and memory usage may vary slightly. Please refer to the actual running environment.**
Test hardware information:
+ OS: Ubuntu 22.04 (This tutorial must be executed in a Linux environment)
+ Memory: 512GB
+ Python: 3.10.12
+ CPU: Intel(R) Xeon(R) Platinum 8358 CPU / 12th Gen Intel i5-12400
## Installing Dependencies
Before starting the inference, please install the dependencies in `inference`, and you need to install the dependencies in this directory:
```shell
pip install-r requirements.txt
```
## Running Model Inference
```shell
python itrex_cli_demo.py
```
If this is your first inference, there will be a process of converting model weights. The converted model weights are stored in the `runtime_outputs` folder, which will consume about `60G` of disk space.
After the conversion is completed, there are two files in the folder:
+ ne_chatglm2_f32.bin 52G (If you do not use FP32 for inference, you can delete this file)
is an open source toolkit designed by Intel for deep learning inference. It can help developers optimize models, improve inference performance, and reduce model memory usage.
This example will show how to deploy the GLM-4-9B-Chat model using OpenVINO.
## 1. Environment configuration
First, you need to install the dependencies
```bash
pip install-r requirements.txt
```
## 2. Convert the model
Since the Huggingface model needs to be converted to an OpenVINO IR model, you need to download the model and convert it.
Before starting fine-tuning, please install the dependencies in `inference`, ensure you have cloned the latest version of the model repository, and install the dependencies in this directory:
```bash
pip install-r requirements.txt
...
...
@@ -95,21 +109,107 @@ For data files, the sample uses the following format:
"content":"{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"
},
{
"role":"assistant",
"content":"Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."
}
]
}
```
{"messages": [{"role": "system", "content": "", "tools": [{"type": "function", "function": {"name": "get_recommended_books", "description": "Get recommended books based on user's interests", "parameters": {"type": "object", "properties": {"interests": {"type": "array", "items": {"type": "string"}, "description": "The interests to recommend books for"}}, "required": ["interests"]}}}]}, {"role": "user", "content": "Hi, I am looking for some book recommendations. I am interested in history and science fiction."}, {"role": "assistant", "content": "{\"name\": \"get_recommended_books\", \"arguments\": {\"interests\": [\"history\", \"science fiction\"]}}"}, {"role": "observation", "content": "{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"}, {"role": "assistant", "content": "Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."}]}
This is a sample with VQA Task:
```json
{
"messages":[
{
"role":"user",
"content":"图片中的动物是什么?",
"image":"/root/images/0001.jpg"
},
{
"role":"assistant",
"content":"图片中有一只猫。"
},
{
"role":"user",
"content":"图片中的猫在做什么?"
},
{
"role":"assistant",
"content":"这只猫坐在或站在桌子上,桌上有很多食物。"
}
]
}
```
- The `system` role is optional, but if it exists, it must appear before the `user` role, and a complete conversation
data (whether single-round or multi-round conversation) can only have one `system` role.
- The `tools` field is optional. If it exists, it must appear after the `system` role, and a complete conversation
data (whether single-round or multi-round conversation) can only have one `tools` field. When the `tools` field
exists, the `system` role must exist and the `content` field is empty.
- The `system` role is optional, but if it exists, it must appear before the `user` role, and the `system` role can only
appear once in a complete conversation (whether it is a single round or a multi-round conversation).
- The `tools` field is optional, but if it exists, it must appear after the `system` role, and the `tools` field can
only appear once in a complete conversation (whether it is a single round or a multi-round conversation). When
the `tools` field exists, the `system` role must exist and the `content` field is empty.
-`GLM-4V-9B` does not support the `tools` field and the `system` field. And `image` must be placed in the first
message. The `image` field needs to contain the `absolute path` of the image.
## Configuration file
...
...
@@ -119,9 +219,8 @@ The fine-tuning configuration file is located in the `config` directory, includi
2.`lora.yaml / ptuning_v2
3. .yaml / sft.yaml`: Configuration files for different modes of models, including model parameters, optimizer
parameters, training parameters, etc. Some important parameters are explained as follows:
parameters, training parameters, etc. Some important parameters are explained as follows: + data_config section
+ data_config section
+ train_file: File path of training dataset.
+ val_file: File path of validation dataset.
+ test_file: File path of test dataset.
...
...
@@ -152,8 +251,7 @@ The fine-tuning configuration file is located in the `config` directory, includi
+ r: rank of LoRA.
+ lora_alpha: scaling factor of LoRA.
+ lora_dropout: dropout probability to use in LoRA layer.
+ P-TuningV2 parameters:
+ num_virtual_tokens: the number of virtual tokens.
+ P-TuningV2 parameters: + num_virtual_tokens: the number of virtual tokens.
+ num_attention_heads: 2: the number of attention heads of P-TuningV2 (do not change).
+ token_dim: 256: the token dimension of P-TuningV2 (do not change).
...
...
@@ -163,15 +261,31 @@ Execute **single machine multi-card/multi-machine multi-card** run through the f
the acceleration solution, and you need to install `deepspeed`.
python finetune.py data/AdvertiseGen/ THUDM/GLM-4-9B-0414 configs/lora.yaml # For Chat Fine-tune
python finetune_vision.py data/CogVLM-311K/ THUDM/glm-4v-9b configs/lora.yaml # For VQA Fine-tune
```
## Log Visualization Support
The fine-tuning code supports using SwanLab to visualize and track training metrics. You can enable tracking by installing SwanLab:
```shell
pip install swanlab
```
You can visit the [SwanLab Visualization Dashboard](https://swanlab.cn/@ShaohonChen/GLM4-Finetune) to view the training logs of example fine-tuning scripts.
If prompted to log in, you can obtain an API Key by visiting [https://swanlab.cn/space/~/settings](https://swanlab.cn/space/~/settings).
If you only want to use the local dashboard, set `swanlab: local` in the configuration parameters and use the `swanlab watch` command to start the offline dashboard.
## Fine-tune from a saved point
If you train as described above, each fine-tuning will start from the beginning. If you want to fine-tune from a
...
...
@@ -184,21 +298,11 @@ half-trained model, you can add a fourth parameter, which can be passed in two w
For example, this is an example code to continue fine-tuning from the last saved point
"content":"{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"
},
{
"role":"assistant",
"content":"Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."
}
]
}
```
{"messages": [{"role": "system", "content": "", "tools": [{"type": "function", "function": {"name": "get_recommended_books", "description": "Get recommended books based on user's interests", "parameters": {"type": "object", "properties": {"interests": {"type": "array", "items": {"type": "string"}, "description": "The interests to recommend books for"}}, "required": ["interests"]}}}]}, {"role": "user", "content": "Hi, I am looking for some book recommendations. I am interested in history and science fiction."}, {"role": "assistant", "content": "{\"name\": \"get_recommended_books\", \"arguments\": {\"interests\": [\"history\", \"science fiction\"]}}"}, {"role": "observation", "content": "{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"}, {"role": "assistant", "content": "Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."}]}