Commit a540af5e authored by Rayyyyy's avatar Rayyyyy
Browse files

update README to dtk2404

parent a990c13a
...@@ -9,30 +9,19 @@ Llama-3中选择了一个相对标准的decoder-only的transformer架构。与Ll ...@@ -9,30 +9,19 @@ Llama-3中选择了一个相对标准的decoder-only的transformer架构。与Ll
- 采用分组查询注意力(grouped query attention,GQA)、掩码等技术,帮助开发者以最低的能耗获取绝佳的性能。 - 采用分组查询注意力(grouped query attention,GQA)、掩码等技术,帮助开发者以最低的能耗获取绝佳的性能。
- 在8,192个tokens的序列上训练模型,使用掩码来确保self-attention不会跨越文档边界。 - 在8,192个tokens的序列上训练模型,使用掩码来确保self-attention不会跨越文档边界。
## 算法原理
<div align=center>
<img src="./doc/method.png"/>
</div>
## 环境配置 ## 环境配置
-v 路径、docker_name和imageID根据实际情况修改 -v 路径、docker_name和imageID根据实际情况修改
**注意**:bitsandbytes库功能不全,暂不支持4bits **注意**:bitsandbytes库功能不全,暂不支持量化相关
### Docker(方法一) ### Docker(方法一)
```bash ```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu22.04-dtk23.10.1-py310 docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
cd /your_code_path/llama3_pytorch cd /your_code_path/llama3_pytorch
pip install -e . pip install -e .
pip install deepspeed-0.12.3+gitfe61783.abi0.dtk2310.torch2.1.0a0-cp310-cp310-manylinux2014_x86_64.whl
pip install bitsandbytes-0.43.0-py3-none-any.whl
pip install -U xtuner # 0.1.18
pip install mmengine==0.10.3
``` ```
### Dockerfile(方法二) ### Dockerfile(方法二)
...@@ -40,22 +29,16 @@ pip install mmengine==0.10.3 ...@@ -40,22 +29,16 @@ pip install mmengine==0.10.3
```bash ```bash
cd docker cd docker
docker build --no-cache -t llama3:latest . docker build --no-cache -t llama3:latest .
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
cd /your_code_path/llama3_pytorch cd /your_code_path/llama3_pytorch
pip install -e . pip install -e .
pip install deepspeed-0.12.3+gitfe61783.abi0.dtk2310.torch2.1.0a0-cp310-cp310-manylinux2014_x86_64.whl
pip install bitsandbytes-0.43.0-py3-none-any.whl
pip install -U xtuner # 0.1.18
pip install mmengine==0.10.3
``` ```
### Anaconda(方法三) ### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。 关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```bash ```bash
DTK驱动: dtk23.10.1 DTK驱动: dtk24.04
python: python3.10 python: python3.10
torch: 2.1.0 torch: 2.1.0
xtuner: 0.1.18 xtuner: 0.1.18
...@@ -65,11 +48,6 @@ xtuner: 0.1.18 ...@@ -65,11 +48,6 @@ xtuner: 0.1.18
其它非深度学习库安装方式如下: 其它非深度学习库安装方式如下:
```bash ```bash
pip install -e . pip install -e .
pip install deepspeed-0.12.3+gitfe61783.abi0.dtk2310.torch2.1.0a0-cp310-cp310-manylinux2014_x86_64.whl
pip install bitsandbytes-0.43.0-py3-none-any.whl
pip install -U xtuner # 0.1.18
pip install mmengine==0.10.3
``` ```
## 数据集 ## 数据集
...@@ -82,12 +60,18 @@ pip install mmengine==0.10.3 ...@@ -82,12 +60,18 @@ pip install mmengine==0.10.3
## 训练 ## 训练
### xtuner微调方法 ### xtuner微调方法
1. 下载预训练模型,具体模型请修改`download_models.py` 1. 训练库安装,请注意所需库版本
```bash
pip install deepspeed-0.12.3+das1.0+gita724046.abi0.dtk2404.torch2.1.0-cp310-cp310-manylinux2014_x86_64.whl
pip install -U xtuner # 0.1.18
pip install mmengine==0.10.3
```
2. 下载预训练模型,具体模型请修改`download_models.py`
```bash ```bash
cd /your_code_path/llama3_pytorch cd /your_code_path/llama3_pytorch
pip install modelscope pip install modelscope
python download_models.py python download_models.py
mv ~/.cache/modelscope/hub/LLM-Research ./ mv ./LLM-Research/* ./
``` ```
2. 修改[llama3_8b_instruct_qlora_alpaca_e3_M.py](./llama3_8b_instruct_qlora_alpaca_e3_M.py)代码中的`pretrained_model_name_or_path``data_path`为本地对应数据地址; 2. 修改[llama3_8b_instruct_qlora_alpaca_e3_M.py](./llama3_8b_instruct_qlora_alpaca_e3_M.py)代码中的`pretrained_model_name_or_path``data_path`为本地对应数据地址;
3. 根据硬件环境和自身训练需求来调整`max_length``batch_size``accumulative_counts``max_epochs``lr``save_steps``evaluation_freq`、model.lora中的`r``lora_alpha`参数,默认参数支持4*32G; 3. 根据硬件环境和自身训练需求来调整`max_length``batch_size``accumulative_counts``max_epochs``lr``save_steps``evaluation_freq`、model.lora中的`r``lora_alpha`参数,默认参数支持4*32G;
...@@ -279,12 +263,7 @@ huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct --include "origina ...@@ -279,12 +263,7 @@ huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct --include "origina
│ ├── Meta-Llama-3-70B │ ├── Meta-Llama-3-70B
│ ├── original │ ├── original
│ ├── consolidated.00.pth │ ├── consolidated.00.pth
│ ├── consolidated.01.pth │ ...
│ ├── consolidated.02.pth
│ ├── consolidated.03.pth
│ ├── consolidated.04.pth
│ ├── consolidated.05.pth
│ ├── consolidated.06.pth
│ ├── consolidated.07.pth │ ├── consolidated.07.pth
│ ├── params.json │ ├── params.json
│ └── tokenizer.model │ └── tokenizer.model
...@@ -306,12 +285,7 @@ huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct --include "origina ...@@ -306,12 +285,7 @@ huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct --include "origina
│ └── Meta-Llama-3-70B-Instruct │ └── Meta-Llama-3-70B-Instruct
│ ├── original │ ├── original
│ ├── consolidated.00.pth │ ├── consolidated.00.pth
│ ├── consolidated.01.pth │ ...
│ ├── consolidated.02.pth
│ ├── consolidated.03.pth
│ ├── consolidated.04.pth
│ ├── consolidated.05.pth
│ ├── consolidated.06.pth
│ ├── consolidated.07.pth │ ├── consolidated.07.pth
│ ├── params.json │ ├── params.json
│ └── tokenizer.model │ └── tokenizer.model
...@@ -338,5 +312,4 @@ huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct --include "origina ...@@ -338,5 +312,4 @@ huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct --include "origina
## 参考资料 ## 参考资料
- https://github.com/meta-llama/llama3 - https://github.com/meta-llama/llama3
- https://github.com/InternLM/xtuner - https://github.com/InternLM/xtuner
- https://github.com/SmartFlowAI/EmoLLM
- https://github.com/meta-llama/llama-recipes - https://github.com/meta-llama/llama-recipes
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu22.04-dtk23.10.1-py310 FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
\ No newline at end of file \ No newline at end of file
from modelscope import snapshot_download from modelscope import snapshot_download
model_dir = snapshot_download('LLM-Research/Meta-Llama-3-8B-Instruct') # 下载模型可选为:
# LLM-Research/Meta-Llama-3-8B、LLM-Research/Meta-Llama-3-8B-Instruct
# LLM-Research/Meta-Llama-3-70B、LLM-Research/Meta-Llama-3-70B-Instruct
# 下面以 LLM-Research/Meta-Llama-3-8B-Instruct 为例
model_dir = snapshot_download('LLM-Research/Meta-Llama-3-8B-Instruct', cache_dir="/your_model_save_path/")
print(model_dir) print(model_dir)
\ No newline at end of file
ulimit -u 200000
export OMP_NUM_THREADS=1
export NCCL_DEBUG=INFO
export MIOPEN_FIND_MODE=3
export HSA_FORCE_FINE_GRAIN_PCIE=1
export MIOPEN_COMPILE_PARALLEL_LEVEL=1
export NCCL_PLUGIN_P2P=ucx
export NCCL_SOCKET_IFNAME=ib0
export NCCL_P2P_LEVEL=5
export NCCL_NET_PLUGIN=none
echo "START TIME: $(date)"
hostfile=./hostfile
np=$(cat $hostfile|sort|uniq |wc -l)
np=$(($np*8))
nodename=$(cat $hostfile |sed -n "1p")
dist_url=`echo $nodename | awk '{print $1}'`
which mpirun
mpirun -np $np --allow-run-as-root --hostfile hostfile --bind-to none --mca btl_tcp_if_include $dist_url run_train_single.sh
echo "END TIME: $(date)"
#!/bin/bash
export HSA_FORCE_FINE_GRAIN_PCIE=1
export MIOPEN_FIND_MODE=3
export MIOPEN_COMPILE_PARALLEL_LEVEL=1
export NCCL_PLUGIN_P2P=ucx
export NCCL_SOCKET_IFNAME=ib0
export NCCL_P2P_LEVEL=5
export NCCL_IB_HCA=mlx5_0
export NCCL_DEBUG=INFO
export NCCL_NET_PLUGIN=none
lrank=$OMPI_COMM_WORLD_LOCAL_RANK
echo "LRANK===============================$lrank"
RANK=$OMPI_COMM_WORLD_RANK
WORLD_SIZE=$OMPI_COMM_WORLD_SIZE
export HIP_VISIBLE_DEVICES=0,1,2,3
LR=1e-5
APP="python3 ../main.py \
--deepspeed ../deepspeed.json \
--do_train \
--train_file AdvertiseGen/train.json \
--prompt_column content \
--response_column summary \
--model_name_or_path THUDM/chatglm-6b \
--output_dir ./output_ft/pretrain \
--overwrite_output_dir \
--max_source_length 64 \
--max_target_length 64 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--predict_with_generate \
--max_steps 2000 \
--logging_steps 5 \
--save_steps 1000 \
--learning_rate $LR \
--fp16 \
--local_rank $lrank "
case ${lrank} in
[0])
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export UCX_NET_DEVICES=mlx5_0:1
export UCX_IB_PCI_BW=mlx5_0:50Gbs
numactl --cpunodebind=0 --membind=0 ${APP}
;;
[1])
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export UCX_NET_DEVICES=mlx5_1:1
export UCX_IB_PCI_BW=mlx5_1:50Gbs
numactl --cpunodebind=0 --membind=0 ${APP}
;;
[2])
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export UCX_NET_DEVICES=mlx5_2:1
export UCX_IB_PCI_BW=mlx5_2:50Gbs
numactl --cpunodebind=0 --membind=0 ${APP}
;;
[3])
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export UCX_NET_DEVICES=mlx5_3:1
export UCX_IB_PCI_BW=mlx5_3:50Gbs
numactl --cpunodebind=0 --membind=0 ${APP}
;;
[4])
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export UCX_NET_DEVICES=mlx5_4:1
export UCX_IB_PCI_BW=mlx5_4:50Gbs
numactl --cpunodebind=3 --membind=3 ${APP}
;;
[5])
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export UCX_NET_DEVICES=mlx5_5:1
export UCX_IB_PCI_BW=mlx5_5:50Gbs
numactl --cpunodebind=3 --membind=3 ${APP}
;;
[6])
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export UCX_NET_DEVICES=mlx5_6:1
export UCX_IB_PCI_BW=mlx5_6:50Gbs
numactl --cpunodebind=3 --membind=3 ${APP}
;;
[7])
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export UCX_NET_DEVICES=mlx5_7:1
export UCX_IB_PCI_BW=mlx5_7:50Gbs
numactl --cpunodebind=3 --membind=3 ${APP}
;;
esac
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment