# ChatGLM3-6B finetune This directory provides fine-tuning examples of the ChatGLM3-6B model, including full fine-tuning and P-Tuning v2. In terms of format, it provides multiple rounds of dialogue fine-tuning samples and input and output format fine-tuning samples. If the model is downloaded locally, the `THUDM/chatglm3-6b` field in this article and the code should be replaced with the corresponding address to load the model locally. Running the example requires `python>=3.10`. In addition to the basic `torch` dependency, the example code also requires dependencies to run. **We provide [sample notebook](lora_finetune.ipynb) to demonstrate how to use our fine-tuning code. ** ```bash pip install -r requirements.txt ``` ## Test hardware standards We only provide single-machine multi-card/multi-machine multi-card running examples, so you will need at least one machine with multiple GPUs. In the **default configuration file** in this warehouse, we record the usage of video memory: + SFT full fine-tuning: evenly distributed among 4 graphics cards, each graphics card occupies `48346MiB` of video memory. + P-TuningV2 fine-tuning: 1 graphics card, occupying `18426MiB` memory. + LORA fine-tuning: 1 graphics card, occupying `14082MiB` memory. > Please note that this result is for reference only, and the memory usage may be different for different parameters. > Please make adjustments based on your hardware conditions. ## Multi-turn dialogue format The multi-round dialogue fine-tuning example adopts the ChatGLM3 dialogue format convention and adds different `loss_mask` to different characters to calculate `loss` for multiple rounds of responses in one pass. For data files, the sample adopts the following format If you only want to fine-tune your model's conversational capabilities, rather than its tool capabilities, you should organize your data in the following format. ```json [ { "conversations": [ { "role": "system", "content": "" }, { "role": "user", "content": "" }, { "role": "assistant", "content": "" }, // ... Muti Turn { "role": "user", "content": "" }, { "role": "assistant", "content": "" } ] } // ... ] ``` **Please note that this method will affect the tool calling function of the model when there are many fine-tuning steps ** If you wish to fine-tune your model's dialog and tool capabilities, you should organize your data in the following format. ```json [ { "tools": [ // available tools, format is not restricted ], "conversations": [ { "role": "system", "content": "" }, { "role": "user", "content": "" }, { "role": "assistant", "content": "" }, { "role": "tool", "name": "": "" }, "observation": "" // don't have to be string }, { "role": "assistant", "content": "" }, // ... Muti Turn { "role": "user", "content": "" }, { "role": "assistant", "content": "" } ] } // ... ] ``` - There is no need to manually insert the system prompt about the tool description. The `tools` field will be used during preprocessing using `json.dumps(..., ensure_ascii=False)` After formatting, insert it as the first system prompt. - Each role can be accompanied by a `loss` field of type `bool`, indicating whether the content predicted by this field participates in `loss` calculate. If there is no such field, the sample implementation does not calculate `loss` for `system` and `user` by default, but calculates `loss` for other roles. - `tool` is not a native role in ChatGLM3. The `tool` here will be automatically converted into an `assistant` with tool call `metadata` during the preprocessing stage. role (default `loss` is calculated) and an `observation` role representing the tool return value (`loss` is not calculated). - The fine-tuning task of `Code interpreter` has not been implemented yet. - The `system` role is optional, but if the `system` role exists, it must appear in `user` Before the character, the `system` character can only appear once in a complete dialogue data (regardless of single round or multiple rounds of dialogue). ## Dataset format example Here we take the AdvertiseGen data set as an example, You can download it from [Google Drive](https://drive.google.com/file/d/13_vf0xRTQsyneRKdD1bZIr93vBGOczrk/view?usp=sharing) Or [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/b3f119a008264b1cabd1/?dl=1) download the AdvertiseGen data set. Place the decompressed AdvertiseGen directory in the `data` directory and convert it into the following format data set yourself. > Please note that the verification set is added to the current fine-tuning code. Therefore, for a complete set of > fine-tuning data sets, the training data set and the verification data set must be included, and the test data set > does > not need to be filled in. Or directly use the validation data set instead. ``` {"conversations": [{"role": "user", "content": "Type#skirt*skirt length#skirt"}, {"role": "assistant", "content": "This is versatile Fashionable fairy skirt, the overall design is very elegant and casual. Every girl can instantly turn into a fairy after wearing it. The material is very light and breathable, making it very comfortable to wear in summer."} ]} ``` ## Configuration file Fine-tuning configuration files are located in the `config` directory and include the following files: 1. `ds_zereo_2 / ds_zereo_3.json`: deepspeed configuration file. 2. `lora.yaml / ptuning.yaml / sft.yaml`: Configuration files for different models, including model parameters, optimizer parameters, training parameters, etc. Some important parameters are explained as follows: + data_config section + train_file: The file path of the training data set. + val_file: The file path of the verification data set. + test_file: The file path of the test data set. + num_proc: Number of processes used when loading data. + max_input_length: The maximum length of the input sequence. + max_output_length: The maximum length of the output sequence. + training_args section + output_dir: Directory for saving models and other outputs. + max_steps: The maximum number of steps for training. + per_device_train_batch_size: training batch size per device (e.g. GPU). + dataloader_num_workers: The number of worker threads used when loading data. + remove_unused_columns: Whether to remove unused columns in the data. + save_strategy: model saving strategy (for example, how many steps should be saved). + save_steps: How many steps should be taken to save the model. + log_level: log level (such as info). + logging_strategy: logging strategy. + logging_steps: How many steps to log. + per_device_eval_batch_size: Evaluation batch size per device. + evaluation_strategy: Evaluation strategy (e.g. how many steps should be evaluated). + eval_steps: How many steps to evaluate. + predict_with_generate: Whether to use generate mode for prediction. + generation_config section + max_new_tokens: The maximum number of new tokens generated. + peft_config section + peft_type: The parameter valid adjustment type used (e.g. LORA). + task_type: task type, here is the causal language model (CAUSAL_LM). + Lora parameters: + r: LoRA rank. + lora_alpha: Scaling factor for LoRA. + lora_dropout: dropout probability used in LoRA layer + P-TuningV2 parameters: + num_virtual_tokens: The number of virtual tokens. ## Start fine-tuning Use the following code to execute **single machine multiple cards/multiple machines multiple cards** operation. ```angular2html cd finetune_demo OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=8 finetune_hf.py data/AdvertiseGen/ THUDM/chatglm3-6b configs/lora.yaml configs/ds_zero_2.json ``` Execute **Single machine single card** operation through the following code. ```angular2html cd finetune_demo python finetune_hf.py data/AdvertiseGen/ THUDM/chatglm3-6b configs/lora.yaml ``` ## Fine-tuning from a checkpoint If you train according to the above method, each fine-tuning will start from scratch. If you want to fine-tune from a half-trained model, you can add a fourth parameter, which has two ways to pass in: 1. `yes`, automatically start training from the last saved Checkpoint 2. `XX`, breakpoint number, for example, `600` means training from Checkpoint number 600 For example, this is an example of continuing fine-tuning from the last saved point ```angular2html cd finetune_demo python finetune_hf.py data/AdvertiseGen/ THUDM/chatglm3-6b configs/lora.yaml yes ``` ## Use the fine-tuned model ### Verify the fine-tuned model in inference_hf.py You can use our fine-tuned model in `finetune_demo/inference_hf.py`, which can be easily tested with just one line of code. ```angular2html python inference_hf.py your_finetune_path --prompt your prompt ``` In this way, the answer you get is a fine-tuned answer. ### Use the fine-tuned model in other demos in this repos or external repos You can use our `lora` and fully parameterized fine-tuned models in any demo, as follows: 1. Use the method of reading the model in `finetune_demo/inference_hf.py` to replace the method of reading the model in the demo. > Please note that for LORA and P-TuningV2 we do not merge the trained models, but in `adapter_config.json` > The fine-tuning path is recorded in . If your original model location changes, you should modify the path > of `base_model_name_or_path` in `adapter_config.json`. > Please note that we have only tested using NVIDIA Hopper (representative GPU: H100) and Ampère (representative GPU: > A100) architecture and series of graphics cards. If you use a graphics card with another architecture, you may > experience > 1. Unknown training problem/Video memory usage is different from the above. > 2. The architecture is too low and does not support certain features. > 3. The problem of reasoning effect. > The above three situations are problems that the community has encountered before. Although the probability is extremely low, if you encounter the above problems, you can try to solve them in the community. ```python def load_model_and_tokenizer( model_dir: Union[str, Path], trust_remote_code: bool = True ) -> tuple[ModelType, TokenizerType]: model_dir = _resolve_path(model_dir) if (model_dir / 'adapter_config.json').exists(): model = AutoPeftModelForCausalLM.from_pretrained( model_dir, trust_remote_code=trust_remote_code, device_map='auto' ) tokenizer_dir = model.peft_config['default'].base_model_name_or_path else: model = AutoModelForCausalLM.from_pretrained( model_dir, trust_remote_code=trust_remote_code, device_map='auto' ) tokenizer_dir = model_dir tokenizer = AutoTokenizer.from_pretrained( tokenizer_dir, trust_remote_code=trust_remote_code ) return model, tokenizer ``` 2. Read the fine-tuned model, please note that you should use the location of the fine-tuned model, for example, if your model location is `/path/to/finetune_adapter_model` , the original model address is `path/to/base_model`, then you should use `/path/to/finetune_adapter_model` as `model_dir`. 3. After completing the above operations, you can use the fine-tuned model normally, and other calling methods remain unchanged. ### hint 1. Before starting training, the fine-tuning code will print the preprocessing information of the first training data ( it is commented by default and can be uncommented), which is displayed as ```log Sanity Check >> >> >> >> >> >> > '[gMASK]': 64790 -> -100 'sop': 64792 -> -100 '<|system|>': 64794 -> -100 '': 30910 -> -100 '\n': 13 -> -100 'Answer': 20115 -> -100 'the': 267 -> -100 'following': 1762 -> -100 ... 'know': 683 -> -100 'the': 267 -> -100 'response': 3010 -> -100 'details': 3296 -> -100 '.': 30930 -> -100 '<|assistant|>': 64796 -> -100 '': 30910 -> 30910 '\n': 13 -> 13 'I': 307 -> 307 'need': 720 -> 720 'to': 289 -> 289 'use': 792 -> 792 ... << << << << << << < Sanity Check ``` words, each line represents a detokenized string, token_id and target_id in turn. Among them, `target_id` is the index of `token_id` in the model vocabulary, and `-100` means that Token does not participate in `loss` calculation. 2. The function of `_prepare_model_for_training` is to iterate through all the trainable parameters of the model and ensure that their data type is `torch.float32`. This is necessary in some cases because mixed precision training or other operations may change the data type of the model parameters. This code is opened by default and can be commented, but if you use If there is a problem with `half` format training, you can switch back to this code, and the video memory may increase. 3. In our [Huggingface model code](https://huggingface.co/THUDM/chatglm3-6b/blob/main/modeling_chatglm.py), there is the following content: ```python if self.gradient_checkpointing and self.training: layer_ret = torch.utils.checkpoint.checkpoint( layer, hidden_states, attention_mask, rotary_pos_emb, kv_caches[index], use_cache, use_reentrant=False ) ``` This may cause the video memory to increase during training, so if you have insufficient video memory, you can try changing ``` use_reentrant``` to `True`. 4. The fine-tuned model can use any model acceleration framework that supports `peft` loading. Here, we do not provide a demo. 5. There are certain differences between the fine-tuning data set format of this warehouse and the API fine-tuning data set format. + The `messages` field in the ZhipuAI API fine-tuning data set is the `conversation` field in this warehouse. + The fine-tuning file in ZhipuAI API is `jsonl`. In this warehouse, you need to simply change the file name to `json`. ## Citation ``` @inproceedings{liu2022p, title={P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks}, author={Liu, Xiao and Ji, Kaixuan and Fu, Yicheng and Tam, Weng and Du, Zhengxiao and Yang, Zhilin and Tang, Jie}, booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)}, pages={61--68}, year={2022} } @misc{tang2023toolalpaca, title={ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases}, author={Qiaoyu Tang and Ziliang Deng and Hongyu Lin and Xianpei Han and Qiao Liang and Le Sun}, year={2023}, eprint={2306.05301}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```