readme

4ce7f69d · yuguo-Jack · 10f294ff · 4ce7f69d · 4ce7f69d
Commit 4ce7f69d authored Dec 19, 2023 by yuguo-Jack
Show whitespace changes
Inline Side-by-side

Showing with 114 additions and 13 deletions

README.md README.md +86 -13

llm/llama/sft_tp_argument.json llm/llama/sft_tp_argument.json +28 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -50,6 +50,8 @@ LLaMA，这是一个基础语言模型的集合，参数范围从7B到65B。在
 ## 数据集
+### 增量预训练
 数据详细制作流程可参考[此处](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/ernie-1.0/preprocess/README.md)，例：OpenWebText2预训练数据制作参考[此处](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/ernie-1.0/preprocess/docs/OpenWebText2.md)
 为了方便用户运行测试本模型，本项目提供了处理好的100k条doc的训练样本：
@@ -63,6 +65,14 @@ LLaMA，这是一个基础语言模型的集合，参数范围从7B到65B。在
    ├── llama_openwebtext_100k_ids.npy
    └── llama_openwebtext_100k_idx.npz
+### SFT
+```
+cd ./llm/
+wget https://bj.bcebos.com/paddlenlp/datasets/examples/AdvertiseGen.tar.gz
+tar -zxvf AdvertiseGen.tar.gz
+```
 ## 环境配置
 ### Docker
@@ -84,6 +94,8 @@ pip3 install tool_helpers visualdl==2.5.3 -i http://mirrors.aliyun.com/pypi/simp
 ## 训练
+### 增量预训练
 权重链接
 13B:[https://bj.bcebos.com/paddlenlp/models/community/facebook/llama-13b](https://bj.bcebos.com/paddlenlp/models/community/facebook/llama-13b)
@@ -92,9 +104,15 @@ pip3 install tool_helpers visualdl==2.5.3 -i http://mirrors.aliyun.com/pypi/simp
 该训练脚本需要1节点，每节点8张DCU-Z100L-32G。
-并行配置采用TP 8，PP 1，使用fp16精度微调，配置如下：
+并行配置采用TP 8，PP 1，使用fp16精度预训练，配置如下：
 ```
+--model_type "llama" \
+--model_name_or_path "facebook/llama-13b" \
+--tokenizer_name_or_path "facebook/llama-13b" \
+--input_dir "./data" \
+--output_dir "output/$task_name" \
+--split 949,50,1 \
 --max_seq_length 2048 \
 --per_device_train_batch_size 1 \
 --gradient_accumulation_steps 2 \
@@ -127,16 +145,72 @@ pip3 install tool_helpers visualdl==2.5.3 -i http://mirrors.aliyun.com/pypi/simp
 --distributed_dataloader 1
 ```
-微调命令：
+增量预训练命令：
 ```
 cd ./llm/llama/
 bash run_trainer_tp8.sh
 ```
+注意：
+1. `continue_training` 表示从现有的预训练模型加载训练。7b，13b模型初始loss大概为1.9x, 随机初始化模型loss从11.x左右下降。
+2. 多机训练时，若各机器使用的训练数据文件位置相同（例如挂载共享硬盘情况），请指定`--share_folder true`使全局0号卡制作缓存数据。否则默认各台机器的0号卡独立制作缓存数据，
+3. 若数据集文件夹中存在默认缓存文件夹`index-cache/`，则额外指定的`--data_cache`不生效，训练时优先加载默认缓存文件夹中的内容。
+### SFT
+权重链接
+13B:[https://bj.bcebos.com/paddlenlp/models/community/facebook/llama-13b](https://bj.bcebos.com/paddlenlp/models/community/facebook/llama-13b)
+7B:[https://bj.bcebos.com/paddlenlp/models/community/facebook/llama-7b](https://bj.bcebos.com/paddlenlp/models/community/facebook/llama-7b)
+该训练脚本需要1节点，每节点8张DCU-Z100L-32G。
+并行配置采用TP 8，PP 1，使用fp16精度微调，配置如下：
+```
+{
+    "model_name_or_path": "facebook/llama-13b",
+    "dataset_name_or_path": "./data",
+    "output_dir": "./checkpoints/llama_sft_ckpts",
+    "per_device_train_batch_size": 1,
+    "gradient_accumulation_steps": 4,
+    "per_device_eval_batch_size": 4,
+    "eval_accumulation_steps":16,
+    "num_train_epochs": 3,
+    "learning_rate": 3e-05,
+    "warmup_steps": 30,
+    "logging_steps": 1,
+    "evaluation_strategy": "epoch",
+    "save_strategy": "epoch",
+    "src_length": 256,
+    "max_length": 512,
+    "fp16": true,
+    "fp16_opt_level": "O2",
+    "do_train": true,
+    "do_eval": true,
+    "disable_tqdm": true,
+    "load_best_model_at_end": true,
+    "eval_with_do_generation": false,
+    "metric_for_best_model": "accuracy",
+    "recompute": true,
+    "save_total_limit": 1,
+    "tensor_parallel_degree": 8
+  }
+```
+SFT命令：
+```
+cd ./llm
+python3 -u  -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" finetune_generation.py ./llama/sft_tp_argument.json
+```
 ## result
-### 精度
+### 增量预训练精度
 训练数据：[https://bj.bcebos.com/paddlenlp/models/transformers/llama/data](https://bj.bcebos.com/paddlenlp/models/transformers/llama/data)
@@ -146,17 +220,17 @@ bash run_trainer_tp8.sh
 | 卡数 | 分布式工具 | 收敛性 |
 | :------: | :------: |:------: |
 | 8 | Paddle |  |
-### input
+### SFT精度
-```plaintext
+训练数据：[https://bj.bcebos.com/paddlenlp/datasets/examples/AdvertiseGen.tar.gz](https://bj.bcebos.com/paddlenlp/datasets/examples/AdvertiseGen.tar.gz)
->>>冬天,中国哪座城市最适合避寒?问题描述:能推荐一些国内适合冬天避寒的城市吗?回答用户:旅游爱好者
-```
+使用的GPGPU：8张DCU-Z100L-32G。
-### output
+模型精度（max_sequence_length: 512）：
-```plaintext
+| 卡数 | 分布式工具 | 收敛性 |
->>>回答:避寒,当然是去海南呀!海南的冬天,阳光明媚,温度适宜,而且空气清新,没有雾霾,没有沙尘暴,没有雾霾,没有雾霾!
+| :--: | :--------: | :----: |
-```
+|  8   |   Paddle   |        |
 ## benchmark
@@ -194,7 +268,7 @@ cd ./examples/benchmark/lambada
 wget https://paddlenlp.bj.bcebos.com/data/benchmark/lambada_test.jsonl
 ```
-验证LAMBADA数据集，运行以下脚本：
+评估LAMBADA数据集，运行以下脚本：
 ```
 python3 eval.py \
@@ -221,5 +295,4 @@ python3 eval.py \
 ## 参考
-* https://huggingface.co/decapoda-research/llama-13b-hf
 * [https://github.com/PaddlePaddle/PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP)
\ No newline at end of file
--- a/llm/llama/sft_tp_argument.json
+++ b/llm/llama/sft_tp_argument.json
+{
+    "model_name_or_path": "facebook/llama-13b",
+    "dataset_name_or_path": "./data",
+    "output_dir": "./checkpoints/llama_sft_ckpts",
+    "per_device_train_batch_size": 1,
+    "gradient_accumulation_steps": 4,
+    "per_device_eval_batch_size": 4,
+    "eval_accumulation_steps":16,
+    "num_train_epochs": 3,
+    "learning_rate": 3e-05,
+    "warmup_steps": 30,
+    "logging_steps": 1,
+    "evaluation_strategy": "epoch",
+    "save_strategy": "epoch",
+    "src_length": 256,
+    "max_length": 512,
+    "fp16": true,
+    "fp16_opt_level": "O2",
+    "do_train": true,
+    "do_eval": true,
+    "disable_tqdm": true,
+    "load_best_model_at_end": true,
+    "eval_with_do_generation": false,
+    "metric_for_best_model": "accuracy",
+    "recompute": true,
+    "save_total_limit": 1,
+    "tensor_parallel_degree": 8
+  }
\ No newline at end of file