Commit 3032ab5b authored by mashun1's avatar mashun1
Browse files

update

parent ef1350fe
...@@ -85,7 +85,9 @@ pip install xentropy_cuda_lib-0.1_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl ...@@ -85,7 +85,9 @@ pip install xentropy_cuda_lib-0.1_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl
## 数据集 ## 数据集
`openassistant-guanaco` `openassistant-guanaco`
- https://huggingface.co/datasets/timdettmers/openassistant-guanaco/tree/main
[huggingface](https://huggingface.co/datasets/timdettmers/openassistant-guanaco/tree/main) | [SCNet](http://113.200.138.88:18080/aidatasets/openassistant-guanaco) 高速通道
项目中已提供用于finetune的迷你数据集,数据目录结构如下: 项目中已提供用于finetune的迷你数据集,数据目录结构如下:
``` ```
...@@ -96,16 +98,18 @@ timdettmers/ ...@@ -96,16 +98,18 @@ timdettmers/
官网提供的从头训练的数据集如下,完整数据集的预处理参照[`PRETRAIN.md`](./PRETRAIN.md) 官网提供的从头训练的数据集如下,完整数据集的预处理参照[`PRETRAIN.md`](./PRETRAIN.md)
`SlimPajama-627B` `SlimPajama-627B`
- https://huggingface.co/datasets/cerebras/SlimPajama-627B
[huggingface](https://huggingface.co/datasets/cerebras/SlimPajama-627B) | [SCNet](http://113.200.138.88:18080/aidatasets/cerebras/SlimPajama-627B) 高速通道
`starcoderdata` `starcoderdata`
- https://huggingface.co/datasets/bigcode/starcoderdata
[huggingface](https://huggingface.co/datasets/bigcode/starcoderdata) | [SCNet](http://113.200.138.88:18080/aidatasets/bigcode/starcoderdata) 高速通道
`更多资料可参考源项目的README_origin.md` `更多资料可参考源项目的README_origin.md`
## 训练 ## 训练
### 单机多卡(finetune) ### 单机多卡(finetune)
``` ```
# finetune所需预训练权重下载地址(权重较大需到hf下载):https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-240k-503b # finetune所需预训练权重参考`预训练权重部分`
# 本步骤说明的预训练权重采用503b,请下载后放入目录PY007下面:PY007/TinyLlama-1.1B-intermediate-step-240k-503b # 本步骤说明的预训练权重采用503b,请下载后放入目录PY007下面:PY007/TinyLlama-1.1B-intermediate-step-240k-503b
cd TinyLlama cd TinyLlama
sh sft/script.sh # 全参数finetune sh sft/script.sh # 全参数finetune
...@@ -139,6 +143,11 @@ Assistant: Well, I really don't want him to be president because of his position ...@@ -139,6 +143,11 @@ Assistant: Well, I really don't want him to be president because of his position
`对话问答` `对话问答`
### 热点应用行业 ### 热点应用行业
`制造,广媒,金融,能源,医疗,家居,教育` `制造,广媒,金融,能源,医疗,家居,教育`
## 预训练权重
[huggingface](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-240k-503b) | [SCNet](http://113.200.138.88:18080/aimodels/tinyllama/TinyLlama-1.1B-intermediate-step-240k-503b) 高速通道
## 源码仓库及问题反馈 ## 源码仓库及问题反馈
- https://developer.hpccube.com/codes/modelzoo/tinyllama_pytorch - https://developer.hpccube.com/codes/modelzoo/tinyllama_pytorch
## 参考资料 ## 参考资料
......
icon.png

53.8 KB

Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment