Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
llama3_pytorch
Commits
917e35e3
Commit
917e35e3
authored
Apr 24, 2024
by
Rayyyyy
Browse files
add 70B and xtuner finetune.
parent
225f11a9
Changes
4
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
110611 additions
and
15 deletions
+110611
-15
README.md
README.md
+71
-15
bitsandbytes-0.43.0-py3-none-any.whl
bitsandbytes-0.43.0-py3-none-any.whl
+0
-0
datasets/multi_turn_dataset_2.json
datasets/multi_turn_dataset_2.json
+110310
-0
llama3_8b_instruct_qlora_alpaca_e3_M.py
llama3_8b_instruct_qlora_alpaca_e3_M.py
+230
-0
No files found.
README.md
View file @
917e35e3
...
@@ -3,7 +3,11 @@
...
@@ -3,7 +3,11 @@
[
llama3
](
https://llama.meta.com/llama3/
)
[
llama3
](
https://llama.meta.com/llama3/
)
## 模型结构
## 模型结构
Llama-3中选择了一个相对标准的decoder-only的transformer架构。与Llama-2相比,我们做了几个关键的改进。Llama 3使用了一个带有128K个标记的标记器,可以更有效地对语言进行编码,从而大大提高了模型的性能。为了提高Llama 3模型的推理效率,我们在8B和70B两个尺寸上都采用了分组查询关注(GQA)。我们在8,192个标记的序列上训练模型,使用掩码来确保self-attention不会跨越文档边界。
Llama-3中选择了一个相对标准的decoder-only的transformer架构。与Llama-2相比,做了几个关键的改进:
-
基于超过15T token训练数据,大小相当于Llama 2数据集的7倍还多,增强了推理、代码生成和指令跟随等方面的能力;
-
支持8K长文本(之前是4k),改进的tokenizer具有128K tokens的词汇量,可以更有效地对语言进行编码,从而大大提高了模型的性能;
-
采用分组查询注意力(grouped query attention,GQA)、掩码等技术,帮助开发者以最低的能耗获取绝佳的性能。
-
在8,192个tokens的序列上训练模型,使用掩码来确保self-attention不会跨越文档边界。
## 算法原理
## 算法原理
...
@@ -22,6 +26,8 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/op
...
@@ -22,6 +26,8 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/op
cd
/your_code_path/llama3_pytorch
cd
/your_code_path/llama3_pytorch
pip
install
-e
.
pip
install
-e
.
pip
install
-U
xtuner
pip
install
bitsandbytes-0.43.0-py3-none-any.whl
```
```
### Dockerfile(方法二)
### Dockerfile(方法二)
...
@@ -33,27 +39,38 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/op
...
@@ -33,27 +39,38 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/op
cd
/your_code_path/llama3_pytorch
cd
/your_code_path/llama3_pytorch
pip
install
-e
.
pip
install
-e
.
pip
install
-U
xtuner
pip
install
bitsandbytes-0.43.0-py3-none-any.whl
```
```
### Anaconda(方法三)
### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从
[
光合
](
https://developer.hpccube.com/tool/
)
开发者社区下载安装。
关于本项目DCU显卡所需的特殊深度学习库可从
[
光合
](
https://developer.hpccube.com/tool/
)
开发者社区下载安装。
```
bash
```
bash
DTK驱动
:
dtk23.10.1
DTK驱动
:
dtk23.10.1
python
:
python3.8
python
:
python3.8
torch
:
2.1.0
torch
:
2.1.0
```
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
其它非深度学习库安装方式如下:
其它非深度学习库安装方式如下:
```
bash
```
bash
pip
install
-e
.
pip
install
-e
.
pip
install
-U
xtuner
pip
install
bitsandbytes-0.43.0-py3-none-any.whl
```
```
## 数据集
## 数据集
官方暂无
官方暂无
## 训练
## 训练
暂无
### xtuner微调方法
1.
修改
[
llama3_8b_instruct_qlora_alpaca_e3_M.py
](
./llama3_8b_instruct_qlora_alpaca_e3_M.py
)
代码中的
`pretrained_model_name_or_path`
、
`data_path`
为本地对应数据地址;
2.
根据硬件环境和自身训练需求来调整
`max_length`
、
`batch_size`
、
`accumulative_counts`
、
`max_epochs`
、
`lr`
、
`save_steps`
、
`evaluation_freq`
、model.lora中的
`r`
、
`lora_alpha`
参数;
3.
${DCU_NUM}参数修改为要使用的DCU卡数量;
4.
执行
```
bash
NPROC_PER_NODE
=
${
DCU_NUM
}
xtuner train ./llama3_8b_instruct_qlora_alpaca_e3_M.py
--deepspeed
deepspeed_zero2
```
## 推理
## 推理
预训练模型下载方法请参考下面的
[
预训练权重
](
#预训练权重
)
章节,不同的模型需要不同的模型并行(MP)值,如下表所示:
预训练模型下载方法请参考下面的
[
预训练权重
](
#预训练权重
)
章节,不同的模型需要不同的模型并行(MP)值,如下表所示:
...
@@ -61,6 +78,7 @@ pip install -e .
...
@@ -61,6 +78,7 @@ pip install -e .
| Model | MP |
| Model | MP |
|--------|----|
|--------|----|
| 8B | 1 |
| 8B | 1 |
| 70B | 8 |
所有模型都支持序列长度高达8192个tokens,但我们根据max_seq_len和max_batch_size值预先分配缓存。根据你的硬件设置。
所有模型都支持序列长度高达8192个tokens,但我们根据max_seq_len和max_batch_size值预先分配缓存。根据你的硬件设置。
...
@@ -69,9 +87,9 @@ pip install -e .
...
@@ -69,9 +87,9 @@ pip install -e .
-
`max_seq_len`
和
`max_batch_size`
参数按需设置。
-
`max_seq_len`
和
`max_batch_size`
参数按需设置。
### Pretrained模型
### Pretrained模型
这些模型都没有针对聊天或者Q&A进行微调。可以参考
`example_text_completion.py`
里的用例。
这些模型都没有针对聊天或者Q&A进行微调。可以参考
`example_text_completion.py`
里的用例。
-
Meta-Llama-3-8B 模型示例
-
Meta-Llama-3-8B 模型示例
,Meta-Llama-3-70B模型仅需替换--ckpt_dir、--tokenizer_path对应模型地址即可。
```
bash
```
bash
torchrun
--nproc_per_node
1 example_text_completion.py
\
torchrun
--nproc_per_node
1 example_text_completion.py
\
--ckpt_dir
Meta-Llama-3-8B/original/
\
--ckpt_dir
Meta-Llama-3-8B/original/
\
...
@@ -80,15 +98,15 @@ torchrun --nproc_per_node 1 example_text_completion.py \
...
@@ -80,15 +98,15 @@ torchrun --nproc_per_node 1 example_text_completion.py \
```
```
### Instruction-tuned模型
### Instruction-tuned模型
经过微调的模型被训练用于对话应用程序。为了获得
预期的功能
和性能,需要遵循
[
`ChatFormat`
](
https://github.com/meta-llama/llama3/blob/main/
llama/tokenizer.py#L202
)
中定义的特定格式:
经过微调的模型被训练用于对话应用程序。为了获得
模型的预期特性
和性能,需要遵循
[
`ChatFormat`
](
llama/tokenizer.py#L202
)
中定义的特定格式:
-
prompt以
`
<|begin_of_text|>
`
特殊token开始,之后是一条或多条message
。
-
提示以特殊令牌
<
|
begin_of_text
|
>
开始,之后跟随一个或多个消息
。
-
每条
message都以
`<|start_header_id|>`
标签,
`system`
、
`user`
或者
`assistant`
角色、以及
`<|end_header_id|>`
标签开头
。
-
每条
消息以标签
`<|start_header_id|>`
开始,角色为
`system`
、
`user`
或者
`assistant`
、并以标签
`<|end_header_id|>`
结束
。
-
在双换行符
`\n\n`
之后
是message的内容
。
-
在双换行符
`\n\n`
之后
,消息的内容随之而来
。
-
每条
message
的结尾
用
`<|eot_id|>`
token
标记。
-
每条
消息
的结尾
由
`<|eot_id|>`
令牌
标记。
您还可以部署额外的分类器来过滤被认为不安全的输入和输出。有关如何向推理代码的输入和输出添加安全检查器,请参阅
[
llama-recipes repo
](
https://github.com/meta-llama/llama-recipes/blob/main/recipes/inference/local_inference/inference.py
)
。
您还可以部署额外的分类器来过滤被认为不安全的输入和输出。有关如何向推理代码的输入和输出添加安全检查器,请参阅
[
llama-recipes repo
](
https://github.com/meta-llama/llama-recipes/blob/main/recipes/inference/local_inference/inference.py
)
。
-
Meta-Llama-3-8B-Instruct 模型示例
-
Meta-Llama-3-8B-Instruct 模型示例
,Meta-Llama-3-70B-Instruct模型仅需替换--ckpt_dir、--tokenizer_path对应模型地址即可。
```
bash
```
bash
torchrun
--nproc_per_node
1 example_chat_completion.py
\
torchrun
--nproc_per_node
1 example_chat_completion.py
\
--ckpt_dir
Meta-Llama-3-8B-Instruct/original/
\
--ckpt_dir
Meta-Llama-3-8B-Instruct/original/
\
...
@@ -139,17 +157,53 @@ mkdir Meta-Llama-3-8B-Instruct
...
@@ -139,17 +157,53 @@ mkdir Meta-Llama-3-8B-Instruct
huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct
--include
"original/*"
--local-dir
Meta-Llama-3-8B-Instruct
--token
hf_
*
huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct
--include
"original/*"
--local-dir
Meta-Llama-3-8B-Instruct
--token
hf_
*
```
```
-
Meta-Llama-3-70B 模型
```
bash
mkdir
Meta-Llama-3-70B
huggingface-cli download meta-llama/Meta-Llama-3-70B
--include
"original/*"
--local-dir
Meta-Llama-3-70B
--token
hf_
*
```
-
Meta-Llama-3-70B-Instruct 模型
```
bash
mkdir
Meta-Llama-3-70B-Instruct
huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct
--include
"original/*"
--local-dir
Meta-Llama-3-70B-Instruct
--token
hf_
*
```
模型目录结构如下:
模型目录结构如下:
```
bash
```
bash
├── llama3_pytorch
├── llama3_pytorch
│ ├── Meta-Llama-3-8B
│ ├── Meta-Llama-3-8B
│
├
── original
│
└
── original
│ ├── consolidated.00.pth
│ ├── consolidated.00.pth
│ ├── params.json
│ ├── params.json
│ └── tokenizer.model
│ └── tokenizer.model
│ ├── Meta-Llama-3-8B-Instruct
│ ├── Meta-Llama-3-8B-Instruct
│ ├── original
│ └── original
│ ├── consolidated.00.pth
│ ├── params.json
│ └── tokenizer.model
│ ├── Meta-Llama-3-70B
│ └── original
│ ├── consolidated.00.pth
│ ├── consolidated.01.pth
│ ├── consolidated.02.pth
│ ├── consolidated.03.pth
│ ├── consolidated.04.pth
│ ├── consolidated.05.pth
│ ├── consolidated.06.pth
│ ├── consolidated.07.pth
│ ├── params.json
│ └── tokenizer.model
│ └── Meta-Llama-3-70B-Instruct
│ └── original
│ ├── consolidated.00.pth
│ ├── consolidated.00.pth
│ ├── consolidated.01.pth
│ ├── consolidated.02.pth
│ ├── consolidated.03.pth
│ ├── consolidated.04.pth
│ ├── consolidated.05.pth
│ ├── consolidated.06.pth
│ ├── consolidated.07.pth
│ ├── params.json
│ ├── params.json
│ └── tokenizer.model
│ └── tokenizer.model
```
```
...
@@ -159,3 +213,5 @@ huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include "original
...
@@ -159,3 +213,5 @@ huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include "original
## 参考资料
## 参考资料
-
https://github.com/meta-llama/llama3
-
https://github.com/meta-llama/llama3
-
https://github.com/InternLM/xtuner
-
https://github.com/SmartFlowAI/EmoLLM
bitsandbytes-0.43.0-py3-none-any.whl
0 → 100644
View file @
917e35e3
File added
datasets/multi_turn_dataset_2.json
0 → 100644
View file @
917e35e3
This diff is collapsed.
Click to expand it.
llama3_8b_instruct_qlora_alpaca_e3_M.py
0 → 100644
View file @
917e35e3
# Copyright (c) OpenMMLab. All rights reserved.
import
torch
from
datasets
import
load_dataset
from
mmengine.dataset
import
DefaultSampler
from
mmengine.hooks
import
(
CheckpointHook
,
DistSamplerSeedHook
,
IterTimerHook
,
LoggerHook
,
ParamSchedulerHook
)
from
mmengine.optim
import
AmpOptimWrapper
,
CosineAnnealingLR
,
LinearLR
from
peft
import
LoraConfig
from
torch.optim
import
AdamW
from
transformers
import
(
AutoModelForCausalLM
,
AutoTokenizer
,
BitsAndBytesConfig
)
from
xtuner.dataset
import
process_hf_dataset
from
xtuner.dataset.collate_fns
import
default_collate_fn
from
xtuner.dataset.map_fns
import
alpaca_map_fn
,
template_map_fn_factory
from
xtuner.engine.hooks
import
(
DatasetInfoHook
,
EvaluateChatHook
,
VarlenAttnArgsToMessageHubHook
)
from
xtuner.engine.runner
import
TrainLoop
from
xtuner.model
import
SupervisedFinetune
from
xtuner.parallel.sequence
import
SequenceParallelSampler
from
xtuner.utils
import
PROMPT_TEMPLATE
,
SYSTEM_TEMPLATE
#######################################################################
# PART 1 Settings #
#######################################################################
# Model
pretrained_model_name_or_path
=
'/home/llama3/Meta-Llama-3-8B-Instruct'
use_varlen_attn
=
False
# new
# Data
data_path
=
'/home/llama3/datasets/multi_turn_dataset_2.json'
prompt_template
=
PROMPT_TEMPLATE
.
llama3_chat
max_length
=
2048
pack_to_max_length
=
True
# parallel
sequence_parallel_size
=
1
# Scheduler & Optimizer
batch_size
=
16
# per_device
accumulative_counts
=
1
accumulative_counts
*=
sequence_parallel_size
dataloader_num_workers
=
0
max_epochs
=
3
optim_type
=
AdamW
lr
=
1e-4
betas
=
(
0.9
,
0.999
)
weight_decay
=
0
max_norm
=
1
# grad clip
warmup_ratio
=
0.03
# Save
save_steps
=
500
save_total_limit
=
2
# Maximum checkpoints to keep (-1 means unlimited)
# Evaluate the generation performance during the training
evaluation_freq
=
500
# SYSTEM = SYSTEM_TEMPLATE.alpaca
# evaluation_inputs = [
# '请给我介绍五个上海的景点', 'Please tell me five scenic spots in Shanghai'
# ]
SYSTEM
=
"你由EmoLLM团队打造的中文领域心理健康助手, 是一个研究过无数具有心理健康问题的病人与心理健康医生对话的心理专家, 在心理方面拥有广博的知识储备和丰富的研究咨询经验,接下来你将只使用中文来回答和咨询问题。"
evaluation_inputs
=
[
'我最近总是感到很焦虑,尤其是在学业上。我有个特别崇拜的同学,他好像在各方面都比我优秀,我总觉得自己怎么努力也追不上他,这让我压力特别大。'
,
'我知道应该理性看待,但就是忍不住会去比较。我甚至晚上会因为这个睡不着觉,总想着怎样才能像他那样出色。'
,
'我今天心情不好,感觉不开心,很烦。'
]
#######################################################################
# PART 2 Model & Tokenizer #
#######################################################################
tokenizer
=
dict
(
type
=
AutoTokenizer
.
from_pretrained
,
pretrained_model_name_or_path
=
pretrained_model_name_or_path
,
trust_remote_code
=
True
,
padding_side
=
'right'
)
model
=
dict
(
type
=
SupervisedFinetune
,
use_varlen_attn
=
use_varlen_attn
,
llm
=
dict
(
type
=
AutoModelForCausalLM
.
from_pretrained
,
pretrained_model_name_or_path
=
pretrained_model_name_or_path
,
trust_remote_code
=
True
,
torch_dtype
=
torch
.
float16
,
# quantization_config=dict(
# type=BitsAndBytesConfig,
# load_in_4bit=False,
# load_in_8bit=False,
# llm_int8_threshold=6.0,
# llm_int8_has_fp16_weight=False,
# bnb_4bit_compute_dtype=torch.float16,
# bnb_4bit_use_double_quant=False,
# bnb_4bit_quant_type='nf4')
),
lora
=
dict
(
type
=
LoraConfig
,
r
=
32
,
# 64
lora_alpha
=
64
,
#16
lora_dropout
=
0.1
,
bias
=
'none'
,
task_type
=
'CAUSAL_LM'
))
#######################################################################
# PART 3 Dataset & Dataloader #
#######################################################################
alpaca_en
=
dict
(
type
=
process_hf_dataset
,
# dataset=dict(type=load_dataset, path=alpaca_en_path),
dataset
=
dict
(
type
=
load_dataset
,
path
=
'json'
,
data_files
=
dict
(
train
=
data_path
)),
tokenizer
=
tokenizer
,
max_length
=
max_length
,
# dataset_map_fn=alpaca_map_fn,
dataset_map_fn
=
None
,
template_map_fn
=
dict
(
type
=
template_map_fn_factory
,
template
=
prompt_template
),
remove_unused_columns
=
True
,
shuffle_before_pack
=
True
,
pack_to_max_length
=
pack_to_max_length
,
use_varlen_attn
=
use_varlen_attn
)
sampler
=
SequenceParallelSampler
\
if
sequence_parallel_size
>
1
else
DefaultSampler
train_dataloader
=
dict
(
batch_size
=
batch_size
,
num_workers
=
dataloader_num_workers
,
dataset
=
alpaca_en
,
# sampler=dict(type=sampler, shuffle=True),
sampler
=
dict
(
type
=
DefaultSampler
,
shuffle
=
True
),
collate_fn
=
dict
(
type
=
default_collate_fn
,
use_varlen_attn
=
use_varlen_attn
))
#######################################################################
# PART 4 Scheduler & Optimizer #
#######################################################################
# optimizer
optim_wrapper
=
dict
(
type
=
AmpOptimWrapper
,
optimizer
=
dict
(
type
=
optim_type
,
lr
=
lr
,
betas
=
betas
,
weight_decay
=
weight_decay
),
clip_grad
=
dict
(
max_norm
=
max_norm
,
error_if_nonfinite
=
False
),
accumulative_counts
=
accumulative_counts
,
loss_scale
=
'dynamic'
,
dtype
=
'float16'
)
# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
param_scheduler
=
[
dict
(
type
=
LinearLR
,
start_factor
=
1e-5
,
by_epoch
=
True
,
begin
=
0
,
end
=
warmup_ratio
*
max_epochs
,
convert_to_iter_based
=
True
),
dict
(
type
=
CosineAnnealingLR
,
eta_min
=
0.0
,
by_epoch
=
True
,
begin
=
warmup_ratio
*
max_epochs
,
end
=
max_epochs
,
# T_max=max_epochs,
convert_to_iter_based
=
True
)
]
# train, val, test setting
# train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
train_cfg
=
dict
(
by_epoch
=
True
,
max_epochs
=
max_epochs
,
val_interval
=
1
)
#######################################################################
# PART 5 Runtime #
#######################################################################
# Log the dialogue periodically during the training process, optional
custom_hooks
=
[
dict
(
type
=
DatasetInfoHook
,
tokenizer
=
tokenizer
),
dict
(
type
=
EvaluateChatHook
,
tokenizer
=
tokenizer
,
every_n_iters
=
evaluation_freq
,
evaluation_inputs
=
evaluation_inputs
,
system
=
SYSTEM
,
prompt_template
=
prompt_template
)
]
if
use_varlen_attn
:
custom_hooks
+=
[
dict
(
type
=
VarlenAttnArgsToMessageHubHook
)]
# configure default hooks
default_hooks
=
dict
(
# record the time of every iteration.
timer
=
dict
(
type
=
IterTimerHook
),
# print log every 10 iterations.
logger
=
dict
(
type
=
LoggerHook
,
interval
=
10
,
log_metric_by_epoch
=
False
),
# enable the parameter scheduler.
param_scheduler
=
dict
(
type
=
ParamSchedulerHook
),
# save checkpoint per `save_steps`.
checkpoint
=
dict
(
type
=
CheckpointHook
,
# by_epoch=False,
interval
=
save_steps
,
max_keep_ckpts
=
save_total_limit
),
# set sampler seed in distributed evrionment.
sampler_seed
=
dict
(
type
=
DistSamplerSeedHook
),
)
# configure environment
env_cfg
=
dict
(
# whether to enable cudnn benchmark
cudnn_benchmark
=
False
,
# set multi process parameters
mp_cfg
=
dict
(
mp_start_method
=
'fork'
,
opencv_num_threads
=
0
),
# set distributed parameters
dist_cfg
=
dict
(
backend
=
'nccl'
),
)
# set visualizer
visualizer
=
None
# set log level
log_level
=
'INFO'
# load from which checkpoint
load_from
=
None
# whether to resume training from the loaded checkpoint
resume
=
False
# Defaults to use random seed and disable `deterministic`
randomness
=
dict
(
seed
=
None
,
deterministic
=
False
)
# set log processor
log_processor
=
dict
(
by_epoch
=
False
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment