Add some explain

0a0618ac · Rayyyyy · 9aa9b60f · 0a0618ac · 0a0618ac
Commit 0a0618ac authored May 21, 2024 by Rayyyyy
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 2 deletions

README.md README.md +2 -1

llama3_8b_instruct_qlora_alpaca_e3_M.py llama3_8b_instruct_qlora_alpaca_e3_M.py +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -62,6 +62,7 @@ pip install -e .
 ### xtuner微调方法
 1. 训练库安装，请注意所需库版本
 ```bash
+pip uninstall fflash-attn # 2.0.4+82379d7.abi0.dtk2404.torch2.1
 pip install deepspeed-0.12.3+das1.0+gita724046.abi0.dtk2404.torch2.1.0-cp310-cp310-manylinux2014_x86_64.whl
 pip install -U xtuner # 0.1.18
 pip install mmengine==0.10.3
@@ -74,7 +75,7 @@ python download_models.py
 ```
 2. 修改[llama3_8b_instruct_qlora_alpaca_e3_M.py](./llama3_8b_instruct_qlora_alpaca_e3_M.py)代码中的`pretrained_model_name_or_path`、`data_path`为本地模型、数据地址；
 3. 根据硬件环境和自身训练需求来调整`max_length`、`batch_size`、`accumulative_counts`、`max_epochs`、`lr`、`save_steps`、`evaluation_freq`、model.lora中的`r`、`lora_alpha`参数，默认参数支持4*32G；
-4. ${DCU_NUM}参数修改为要使用的DCU卡数量，不同数据集需要修改llama3_8b_instruct_qlora_alpaca_e3_M.py中`SYSTEM`、`evaluation_inputs`、`dataset_map_fn`、`train_dataloader.sampler`、`train_cfg`参数设置，详情请参考代码注释项，当前默认alpaca数据集。
+4. ${DCU_NUM}参数修改为要使用的DCU卡数量，不同数据集需要修改llama3_8b_instruct_qlora_alpaca_e3_M.py中`SYSTEM`、`evaluation_inputs`、`dataset_map_fn`、`train_dataloader.sampler`、`train_cfg`参数设置，详情请参考代码注释项，当前默认alpaca数据集，**`--work-dir`设定保存模型路径**；
 5. 执行
 ```bash
 bash finetune.sh

--- a/llama3_8b_instruct_qlora_alpaca_e3_M.py
+++ b/llama3_8b_instruct_qlora_alpaca_e3_M.py
@@ -186,7 +186,7 @@ default_hooks = dict(
    # save checkpoint per `save_steps`.
    checkpoint=dict(
        type=CheckpointHook,
-        # by_epoch=False,
+        by_epoch=False, # save checkpoints by steps
        interval=save_steps,
        max_keep_ckpts=save_total_limit),
    # set sampler seed in distributed evrionment.