update for multinode imp

ef0d50b5 · zhaoying1 · 555d0cba · ef0d50b5 · ef0d50b5
Commit ef0d50b5 authored Feb 19, 2024 by zhaoying1
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 2 deletions

README.md README.md +5 -1

fine-tune/lora_train.sh fine-tune/lora_train.sh +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -87,6 +87,10 @@ site-packages/transformers/utils/versions.py 文件

 训练前请参考[modeling_baichuan.py](./modeling_baichuan.py)修改模型文件夹中modeling_baichuan.py的`Attention`类的代码，主要(暂时)去除去torch2.X的依赖。

+
+### 注意3 
+若不支持xformers，在多节点训练中可能会出现xformers相关报错:"ImportError: This modeling file reguires the following packages that were not found in your environment: xformers." ，您可通过直接将[modeling_baichuan.py](./modeling_baichuan.py)中xpos设置为None来解决，即注释import xformers相关代码，并设置`xops=None`。
+
 ## 数据集

 输入数据为放置在项目[fine-tune/data](./fine-tune/data)目录下的 json 文件，`fine-tune/data/belle_chat_ramdon_10k.json`，该样例数据是从 [multiturn_chat_0.8M](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M) 采样出 1 万条，并且做了格式转换。主要是展示多轮数据怎么训练，不保证效果。json 文件示例格式如下：
@@ -151,7 +155,7 @@ bash run_ft.sh
 1. 单机训练
 ```
 cd fine-tune
-bash run_lora.sh
+bash lora_train.sh
 ```



--- a/fine-tune/lora_train.sh
+++ b/fine-tune/lora_train.sh
 hostfile=""
 HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 deepspeed --hostfile=$hostfile fine-tune.py  \
    --report_to "none" \
-    --data_path "data/test.json" \
+    --data_path "data/belle_chat_ramdon_10k.json" \
    --model_name_or_path "../../baichuan2-13b-chat-hf" \
    --output_dir "output" \
    --model_max_length 64 \