Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Qwen2.5_pytorch
Commits
802ef8b7
Commit
802ef8b7
authored
Oct 11, 2024
by
luopl
Browse files
init
parents
Pipeline
#1743
failed with stages
in 0 seconds
Changes
263
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
1558 additions
and
0 deletions
+1558
-0
LLaMA-Factory/examples/train_lora/qwen2vl_lora_sft.yaml
LLaMA-Factory/examples/train_lora/qwen2vl_lora_sft.yaml
+39
-0
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_aqlm.yaml
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_aqlm.yaml
+39
-0
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_awq.yaml
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_awq.yaml
+39
-0
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_gptq.yaml
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_gptq.yaml
+39
-0
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_otfq.yaml
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_otfq.yaml
+41
-0
LLaMA-Factory/pyproject.toml
LLaMA-Factory/pyproject.toml
+33
-0
LLaMA-Factory/requirements.txt
LLaMA-Factory/requirements.txt
+22
-0
LLaMA-Factory/scripts/cal_flops.py
LLaMA-Factory/scripts/cal_flops.py
+50
-0
LLaMA-Factory/scripts/cal_lr.py
LLaMA-Factory/scripts/cal_lr.py
+98
-0
LLaMA-Factory/scripts/cal_mfu.py
LLaMA-Factory/scripts/cal_mfu.py
+164
-0
LLaMA-Factory/scripts/cal_ppl.py
LLaMA-Factory/scripts/cal_ppl.py
+133
-0
LLaMA-Factory/scripts/length_cdf.py
LLaMA-Factory/scripts/length_cdf.py
+68
-0
LLaMA-Factory/scripts/llama_pro.py
LLaMA-Factory/scripts/llama_pro.py
+131
-0
LLaMA-Factory/scripts/llamafy_baichuan2.py
LLaMA-Factory/scripts/llamafy_baichuan2.py
+109
-0
LLaMA-Factory/scripts/llamafy_qwen.py
LLaMA-Factory/scripts/llamafy_qwen.py
+162
-0
LLaMA-Factory/scripts/loftq_init.py
LLaMA-Factory/scripts/loftq_init.py
+89
-0
LLaMA-Factory/scripts/pissa_init.py
LLaMA-Factory/scripts/pissa_init.py
+87
-0
LLaMA-Factory/scripts/test_toolcall.py
LLaMA-Factory/scripts/test_toolcall.py
+79
-0
LLaMA-Factory/setup.py
LLaMA-Factory/setup.py
+103
-0
LLaMA-Factory/src/api.py
LLaMA-Factory/src/api.py
+33
-0
No files found.
LLaMA-Factory/examples/train_lora/qwen2vl_lora_sft.yaml
0 → 100644
View file @
802ef8b7
### model
model_name_or_path
:
Qwen/Qwen2-VL-7B-Instruct
### method
stage
:
sft
do_train
:
true
finetuning_type
:
lora
lora_target
:
all
### dataset
dataset
:
mllm_demo,identity
# video: mllm_video_demo
template
:
qwen2_vl
cutoff_len
:
1024
max_samples
:
1000
overwrite_cache
:
true
preprocessing_num_workers
:
16
### output
output_dir
:
saves/qwen2_vl-7b/lora/sft
logging_steps
:
10
save_steps
:
500
plot_loss
:
true
overwrite_output_dir
:
true
### train
per_device_train_batch_size
:
1
gradient_accumulation_steps
:
8
learning_rate
:
1.0e-4
num_train_epochs
:
3.0
lr_scheduler_type
:
cosine
warmup_ratio
:
0.1
bf16
:
true
ddp_timeout
:
180000000
### eval
val_size
:
0.1
per_device_eval_batch_size
:
1
eval_strategy
:
steps
eval_steps
:
500
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_aqlm.yaml
0 → 100644
View file @
802ef8b7
### model
model_name_or_path
:
ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16
### method
stage
:
sft
do_train
:
true
finetuning_type
:
lora
lora_target
:
all
### dataset
dataset
:
identity,alpaca_en_demo
template
:
llama3
cutoff_len
:
1024
max_samples
:
1000
overwrite_cache
:
true
preprocessing_num_workers
:
16
### output
output_dir
:
saves/llama3-8b/lora/sft
logging_steps
:
10
save_steps
:
500
plot_loss
:
true
overwrite_output_dir
:
true
### train
per_device_train_batch_size
:
1
gradient_accumulation_steps
:
8
learning_rate
:
1.0e-4
num_train_epochs
:
3.0
lr_scheduler_type
:
cosine
warmup_ratio
:
0.1
bf16
:
true
ddp_timeout
:
180000000
### eval
val_size
:
0.1
per_device_eval_batch_size
:
1
eval_strategy
:
steps
eval_steps
:
500
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_awq.yaml
0 → 100644
View file @
802ef8b7
### model
model_name_or_path
:
TechxGenus/Meta-Llama-3-8B-Instruct-AWQ
### method
stage
:
sft
do_train
:
true
finetuning_type
:
lora
lora_target
:
all
### dataset
dataset
:
identity,alpaca_en_demo
template
:
llama3
cutoff_len
:
1024
max_samples
:
1000
overwrite_cache
:
true
preprocessing_num_workers
:
16
### output
output_dir
:
saves/llama3-8b/lora/sft
logging_steps
:
10
save_steps
:
500
plot_loss
:
true
overwrite_output_dir
:
true
### train
per_device_train_batch_size
:
1
gradient_accumulation_steps
:
8
learning_rate
:
1.0e-4
num_train_epochs
:
3.0
lr_scheduler_type
:
cosine
warmup_ratio
:
0.1
bf16
:
true
ddp_timeout
:
180000000
### eval
val_size
:
0.1
per_device_eval_batch_size
:
1
eval_strategy
:
steps
eval_steps
:
500
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_gptq.yaml
0 → 100644
View file @
802ef8b7
### model
model_name_or_path
:
TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ
### method
stage
:
sft
do_train
:
true
finetuning_type
:
lora
lora_target
:
all
### dataset
dataset
:
identity,alpaca_en_demo
template
:
llama3
cutoff_len
:
1024
max_samples
:
1000
overwrite_cache
:
true
preprocessing_num_workers
:
16
### output
output_dir
:
saves/llama3-8b/lora/sft
logging_steps
:
10
save_steps
:
500
plot_loss
:
true
overwrite_output_dir
:
true
### train
per_device_train_batch_size
:
1
gradient_accumulation_steps
:
8
learning_rate
:
1.0e-4
num_train_epochs
:
3.0
lr_scheduler_type
:
cosine
warmup_ratio
:
0.1
bf16
:
true
ddp_timeout
:
180000000
### eval
val_size
:
0.1
per_device_eval_batch_size
:
1
eval_strategy
:
steps
eval_steps
:
500
LLaMA-Factory/examples/train_qlora/llama3_lora_sft_otfq.yaml
0 → 100644
View file @
802ef8b7
### model
model_name_or_path
:
meta-llama/Meta-Llama-3-8B-Instruct
quantization_bit
:
4
quantization_method
:
bitsandbytes
# choices: [bitsandbytes (4/8), hqq (2/3/4/5/6/8), eetq (8)]
### method
stage
:
sft
do_train
:
true
finetuning_type
:
lora
lora_target
:
all
### dataset
dataset
:
identity,alpaca_en_demo
template
:
llama3
cutoff_len
:
1024
max_samples
:
1000
overwrite_cache
:
true
preprocessing_num_workers
:
16
### output
output_dir
:
saves/llama3-8b/lora/sft
logging_steps
:
10
save_steps
:
500
plot_loss
:
true
overwrite_output_dir
:
true
### train
per_device_train_batch_size
:
1
gradient_accumulation_steps
:
8
learning_rate
:
1.0e-4
num_train_epochs
:
3.0
lr_scheduler_type
:
cosine
warmup_ratio
:
0.1
bf16
:
true
ddp_timeout
:
180000000
### eval
val_size
:
0.1
per_device_eval_batch_size
:
1
eval_strategy
:
steps
eval_steps
:
500
LLaMA-Factory/pyproject.toml
0 → 100644
View file @
802ef8b7
[build-system]
requires
=
["setuptools>=61.0"]
build-backend
=
"setuptools.build_meta"
[tool.ruff]
target-version
=
"py38"
line-length
=
119
indent-width
=
4
[tool.ruff.lint]
ignore
=
[
"C408"
,
"C901"
,
"E501"
,
"E731"
,
"E741"
,
"W605"
]
select
=
[
"C"
,
"E"
,
"F"
,
"I"
,
"W"
]
[tool.ruff.lint.isort]
lines-after-imports
=
2
known-first-party
=
["llamafactory"]
known-third-party
=
[
"accelerate"
,
"datasets"
,
"gradio"
,
"numpy"
,
"peft"
,
"torch"
,
"transformers"
,
"trl"
]
[tool.ruff.format]
quote-style
=
"double"
indent-style
=
"space"
docstring-code-format
=
true
skip-magic-trailing-comma
=
false
line-ending
=
"auto"
LLaMA-Factory/requirements.txt
0 → 100644
View file @
802ef8b7
transformers==4.41.2
accelerate==0.30.1
datasets==2.16.0
peft==0.11.1
trl==0.8.6
gradio==4.0.0
pandas==2.0.0
scipy
einops
sentencepiece
tiktoken
protobuf
uvicorn
pydantic
fastapi
sse-starlette
matplotlib==3.7.0
fire
packaging
pyyaml
numpy<2.0.0
av
LLaMA-Factory/scripts/cal_flops.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 Microsoft Corporation and the LlamaFactory team.
#
# This code is inspired by the Microsoft's DeepSpeed library.
# https://www.deepspeed.ai/tutorials/flops-profiler/
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
fire
import
torch
from
deepspeed.accelerator
import
get_accelerator
# type: ignore
from
deepspeed.profiling.flops_profiler
import
get_model_profile
# type: ignore
from
llamafactory.chat
import
ChatModel
def
calculate_flops
(
model_name_or_path
:
str
,
batch_size
:
int
=
1
,
seq_length
:
int
=
512
,
flash_attn
:
str
=
"auto"
,
):
r
"""
Calculates the flops of pre-trained models.
Usage: python cal_flops.py --model_name_or_path path_to_model --batch_size 1 --seq_length 512
"""
with
get_accelerator
().
device
(
0
):
chat_model
=
ChatModel
(
dict
(
model_name_or_path
=
model_name_or_path
,
template
=
"empty"
,
flash_attn
=
flash_attn
))
fake_input
=
torch
.
ones
((
batch_size
,
seq_length
),
dtype
=
torch
.
long
,
device
=
chat_model
.
engine
.
model
.
device
)
input_dict
=
{
"input_ids"
:
fake_input
,
"labels"
:
fake_input
.
clone
()}
flops
,
macs
,
params
=
get_model_profile
(
chat_model
.
engine
.
model
,
kwargs
=
input_dict
,
print_profile
=
True
,
detailed
=
True
)
print
(
"FLOPs:"
,
flops
)
print
(
"MACs:"
,
macs
)
print
(
"Params:"
,
params
)
if
__name__
==
"__main__"
:
fire
.
Fire
(
calculate_flops
)
LLaMA-Factory/scripts/cal_lr.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 imoneoi and the LlamaFactory team.
#
# This code is inspired by the imoneoi's OpenChat library.
# https://github.com/imoneoi/openchat/blob/3.6.0/ochat/training_deepspeed/train.py
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
math
from
typing
import
Literal
import
fire
import
torch
from
torch.utils.data
import
DataLoader
from
tqdm
import
tqdm
from
transformers
import
DataCollatorForLanguageModeling
,
DataCollatorForSeq2Seq
from
llamafactory.data
import
get_dataset
,
get_template_and_fix_tokenizer
from
llamafactory.extras.constants
import
IGNORE_INDEX
from
llamafactory.hparams
import
get_train_args
from
llamafactory.model
import
load_tokenizer
BASE_LR
=
3e-4
# 1.5e-4 for 30B-70B models
BASE_BS
=
4_000_000
# from llama paper
def
calculate_lr
(
model_name_or_path
:
str
,
batch_size
:
int
,
# total batch size, namely (batch size * gradient accumulation * world size)
stage
:
Literal
[
"pt"
,
"sft"
]
=
"sft"
,
dataset
:
str
=
"alpaca_en_demo"
,
dataset_dir
:
str
=
"data"
,
template
:
str
=
"default"
,
cutoff_len
:
int
=
1024
,
# i.e. maximum input length during training
is_mistral_or_gemma
:
bool
=
False
,
# mistral and gemma models opt for a smaller learning rate,
packing
:
bool
=
False
,
):
r
"""
Calculates the optimal learning rate for 7B/13B models using LLaMA's hyper-parameters.
Usage:
python cal_lr.py --model_name_or_path path_to_model --dataset alpaca_en_demo --cutoff_len 1024 --batch_size 16
"""
model_args
,
data_args
,
training_args
,
_
,
_
=
get_train_args
(
dict
(
stage
=
stage
,
model_name_or_path
=
model_name_or_path
,
dataset
=
dataset
,
dataset_dir
=
dataset_dir
,
template
=
template
,
cutoff_len
=
cutoff_len
,
packing
=
packing
,
output_dir
=
"dummy_dir"
,
overwrite_cache
=
True
,
do_train
=
True
,
)
)
tokenizer_module
=
load_tokenizer
(
model_args
)
tokenizer
=
tokenizer_module
[
"tokenizer"
]
template
=
get_template_and_fix_tokenizer
(
tokenizer
,
data_args
)
trainset
=
get_dataset
(
template
,
model_args
,
data_args
,
training_args
,
stage
,
**
tokenizer_module
)[
"train_dataset"
]
if
stage
==
"pt"
:
data_collator
=
DataCollatorForLanguageModeling
(
tokenizer
=
tokenizer
,
mlm
=
False
)
elif
stage
==
"sft"
:
data_collator
=
DataCollatorForSeq2Seq
(
tokenizer
=
tokenizer
,
label_pad_token_id
=
IGNORE_INDEX
)
else
:
raise
NotImplementedError
(
"Stage does not supported: {}."
.
format
(
stage
))
dataloader
=
DataLoader
(
trainset
,
batch_size
,
shuffle
=
False
,
collate_fn
=
data_collator
,
pin_memory
=
True
)
valid_tokens
,
total_tokens
=
0
,
0
for
batch
in
tqdm
(
dataloader
):
valid_tokens
+=
torch
.
sum
(
batch
[
"labels"
]
!=
IGNORE_INDEX
).
item
()
total_tokens
+=
torch
.
numel
(
batch
[
"labels"
])
batch_max_len
=
cutoff_len
*
batch_size
# max tokens in a batch
valid_ratio
=
valid_tokens
/
total_tokens
batch_valid_len
=
batch_max_len
*
valid_ratio
lr
=
BASE_LR
*
math
.
sqrt
(
batch_valid_len
/
BASE_BS
)
# lr ~ sqrt(batch_size)
lr
=
lr
/
6.0
if
is_mistral_or_gemma
else
lr
print
(
"Optimal learning rate is {:.2e} for valid ratio% {:.2f} and effective batch size {:.2f}"
.
format
(
lr
,
valid_ratio
*
100
,
batch_valid_len
)
)
if
__name__
==
"__main__"
:
fire
.
Fire
(
calculate_lr
)
LLaMA-Factory/scripts/cal_mfu.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 the LlamaFactory team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
json
import
os
import
fire
import
torch
import
torch.distributed
as
dist
from
transformers
import
AutoConfig
from
llamafactory.train.tuner
import
run_exp
BASE
=
2
# gemm (add + mul)
def
compute_model_flops
(
model_name_or_path
:
str
,
total_batch_size
:
int
,
seq_length
:
int
,
include_backward
:
bool
=
True
,
include_recompute
:
bool
=
False
,
include_flashattn
:
bool
=
False
,
)
->
int
:
r
"""
Calculates the FLOPs of model per forward/backward pass.
"""
config
=
AutoConfig
.
from_pretrained
(
model_name_or_path
)
hidden_size
=
getattr
(
config
,
"hidden_size"
,
None
)
vocab_size
=
getattr
(
config
,
"vocab_size"
,
None
)
intermediate_size
=
getattr
(
config
,
"intermediate_size"
,
None
)
num_attention_heads
=
getattr
(
config
,
"num_attention_heads"
,
None
)
num_key_value_heads
=
getattr
(
config
,
"num_key_value_heads"
,
None
)
num_hidden_layers
=
getattr
(
config
,
"num_hidden_layers"
,
None
)
tie_word_embeddings
=
getattr
(
config
,
"tie_word_embeddings"
,
False
)
# mlp module
mlp_flops_per_token
=
3
*
BASE
*
hidden_size
*
intermediate_size
# up, gate, down
mlp_flops
=
total_batch_size
*
seq_length
*
num_hidden_layers
*
mlp_flops_per_token
# attn projector module
q_flops_per_token
=
BASE
*
hidden_size
*
hidden_size
o_flops_per_token
=
BASE
*
hidden_size
*
hidden_size
k_flops_per_token
=
BASE
*
hidden_size
*
hidden_size
*
num_key_value_heads
//
num_attention_heads
v_flops_per_token
=
BASE
*
hidden_size
*
hidden_size
*
num_key_value_heads
//
num_attention_heads
attn_proj_flops_per_token
=
q_flops_per_token
+
o_flops_per_token
+
k_flops_per_token
+
v_flops_per_token
attn_proj_flops
=
total_batch_size
*
seq_length
*
num_hidden_layers
*
attn_proj_flops_per_token
# attn sdpa module
sdpa_flops_per_layer
=
2
*
BASE
*
hidden_size
*
seq_length
*
seq_length
# (q * k^T) * v
sdpa_flops
=
total_batch_size
*
num_hidden_layers
*
sdpa_flops_per_layer
# embedding module
embedding_flops_per_token
=
hidden_size
*
vocab_size
embedding_flops
=
total_batch_size
*
seq_length
*
embedding_flops_per_token
if
tie_word_embeddings
is
False
:
embedding_flops
*=
2
non_embedding_flops
=
mlp_flops
+
attn_proj_flops
+
sdpa_flops
non_embedding_coeff
,
embedding_coeff
=
1
,
1
if
include_backward
:
non_embedding_coeff
+=
2
embedding_coeff
+=
2
if
include_recompute
:
non_embedding_coeff
+=
1
total_flops
=
non_embedding_coeff
*
non_embedding_flops
+
embedding_coeff
*
embedding_flops
if
include_flashattn
:
total_flops
+=
sdpa_flops
return
total_flops
def
compute_device_flops
(
world_size
:
int
)
->
float
:
r
"""
Calculates the FLOPs of the device capability per second.
"""
device_name
=
torch
.
cuda
.
get_device_name
()
if
"H100"
in
device_name
or
"H800"
in
device_name
:
return
989
*
1e12
*
world_size
elif
"A100"
in
device_name
or
"A800"
in
device_name
:
return
312
*
1e12
*
world_size
elif
"V100"
in
device_name
:
return
125
*
1e12
*
world_size
elif
"4090"
in
device_name
:
return
98
*
1e12
*
world_size
else
:
raise
NotImplementedError
(
"Device not supported: {}."
.
format
(
device_name
))
def
calculate_mfu
(
model_name_or_path
:
str
,
batch_size
:
int
=
1
,
seq_length
:
int
=
1024
,
num_steps
:
int
=
100
,
finetuning_type
:
str
=
"lora"
,
flash_attn
:
str
=
"auto"
,
deepspeed_stage
:
int
=
0
,
disable_gc
:
bool
=
False
,
liger_kernel
:
bool
=
False
,
unsloth_gc
:
bool
=
False
,
)
->
float
:
r
"""
Calculates MFU for given model and hyper-params.
Usage: python cal_mfu.py --model_name_or_path path_to_model --batch_size 1 --seq_length 1024
"""
args
=
{
"model_name_or_path"
:
model_name_or_path
,
"flash_attn"
:
flash_attn
,
"disable_gradient_checkpointing"
:
disable_gc
,
"enable_liger_kernel"
:
liger_kernel
,
"use_unsloth_gc"
:
unsloth_gc
,
"stage"
:
"pt"
,
"do_train"
:
True
,
"finetuning_type"
:
finetuning_type
,
"dataset"
:
"c4_demo"
,
"cutoff_len"
:
seq_length
,
"output_dir"
:
os
.
path
.
join
(
"saves"
,
"test_mfu"
),
"logging_strategy"
:
"no"
,
"save_strategy"
:
"no"
,
"save_only_model"
:
True
,
"overwrite_output_dir"
:
True
,
"per_device_train_batch_size"
:
batch_size
,
"max_steps"
:
num_steps
,
"bf16"
:
True
,
}
if
deepspeed_stage
in
[
2
,
3
]:
args
[
"deepspeed"
]
=
"examples/deepspeed/ds_z{}_config.json"
.
format
(
deepspeed_stage
)
run_exp
(
args
)
with
open
(
os
.
path
.
join
(
"saves"
,
"test_mfu"
,
"all_results.json"
),
"r"
,
encoding
=
"utf-8"
)
as
f
:
result
=
json
.
load
(
f
)
if
dist
.
is_initialized
():
world_size
=
dist
.
get_world_size
()
else
:
world_size
=
1
total_batch_size
=
batch_size
*
world_size
mfu_value
=
(
result
[
"train_steps_per_second"
]
*
compute_model_flops
(
model_name_or_path
,
total_batch_size
,
seq_length
)
/
compute_device_flops
(
world_size
)
)
print
(
"MFU: {:.2f}%"
.
format
(
mfu_value
*
100
))
if
__name__
==
"__main__"
:
fire
.
Fire
(
calculate_mfu
)
LLaMA-Factory/scripts/cal_ppl.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 the LlamaFactory team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
json
from
dataclasses
import
dataclass
from
typing
import
Any
,
Dict
,
Literal
,
Optional
,
Sequence
import
fire
import
torch
from
torch.utils.data
import
DataLoader
from
tqdm
import
tqdm
from
transformers
import
DataCollatorForLanguageModeling
,
DataCollatorForSeq2Seq
from
llamafactory.data
import
get_dataset
,
get_template_and_fix_tokenizer
from
llamafactory.extras.constants
import
IGNORE_INDEX
from
llamafactory.hparams
import
get_train_args
from
llamafactory.model
import
load_model
,
load_tokenizer
@
dataclass
class
PairwiseDataCollatorWithPadding
(
DataCollatorForSeq2Seq
):
r
"""
Data collator for pairwise data.
"""
train_on_prompt
:
bool
=
False
def
__call__
(
self
,
features
:
Sequence
[
Dict
[
str
,
Any
]])
->
Dict
[
str
,
torch
.
Tensor
]:
r
"""
Pads batched data to the longest sequence in the batch.
We generate 2 * n examples where the first n examples represent chosen examples and
the last n examples represent rejected examples.
"""
chosen_features
=
[]
for
feature
in
features
:
prompt_len
,
answer_len
=
len
(
feature
[
"prompt_ids"
]),
len
(
feature
[
"chosen_ids"
])
input_ids
=
feature
[
"prompt_ids"
]
+
feature
[
"chosen_ids"
]
attention_mask
=
[
1
]
*
(
prompt_len
+
answer_len
)
labels
=
input_ids
if
self
.
train_on_prompt
else
[
IGNORE_INDEX
]
*
prompt_len
+
feature
[
"chosen_ids"
]
chosen_features
.
append
({
"input_ids"
:
input_ids
,
"attention_mask"
:
attention_mask
,
"labels"
:
labels
})
return
super
().
__call__
(
chosen_features
)
def
calculate_ppl
(
model_name_or_path
:
str
,
save_name
:
str
,
batch_size
:
int
=
4
,
stage
:
Literal
[
"pt"
,
"sft"
,
"rm"
]
=
"sft"
,
dataset
:
str
=
"alpaca_en_demo"
,
dataset_dir
:
str
=
"data"
,
template
:
str
=
"default"
,
cutoff_len
:
int
=
1024
,
max_samples
:
Optional
[
int
]
=
None
,
train_on_prompt
:
bool
=
False
,
):
r
"""
Calculates the ppl on the dataset of the pre-trained models.
Usage: python cal_ppl.py --model_name_or_path path_to_model --dataset alpaca_en_demo --save_name ppl.json
"""
model_args
,
data_args
,
training_args
,
finetuning_args
,
_
=
get_train_args
(
dict
(
stage
=
stage
,
model_name_or_path
=
model_name_or_path
,
dataset
=
dataset
,
dataset_dir
=
dataset_dir
,
template
=
template
,
cutoff_len
=
cutoff_len
,
max_samples
=
max_samples
,
train_on_prompt
=
train_on_prompt
,
output_dir
=
"dummy_dir"
,
overwrite_cache
=
True
,
do_train
=
True
,
)
)
tokenizer_module
=
load_tokenizer
(
model_args
)
tokenizer
=
tokenizer_module
[
"tokenizer"
]
template
=
get_template_and_fix_tokenizer
(
tokenizer
,
data_args
)
trainset
=
get_dataset
(
template
,
model_args
,
data_args
,
training_args
,
stage
,
**
tokenizer_module
)[
"train_dataset"
]
model
=
load_model
(
tokenizer
,
model_args
,
finetuning_args
,
is_trainable
=
False
)
if
stage
==
"pt"
:
data_collator
=
DataCollatorForLanguageModeling
(
tokenizer
=
tokenizer
,
mlm
=
False
)
elif
stage
==
"sft"
:
data_collator
=
DataCollatorForSeq2Seq
(
tokenizer
=
tokenizer
,
label_pad_token_id
=
IGNORE_INDEX
)
elif
stage
==
"rm"
:
data_collator
=
PairwiseDataCollatorWithPadding
(
tokenizer
=
tokenizer
,
label_pad_token_id
=
IGNORE_INDEX
,
train_on_prompt
=
train_on_prompt
)
else
:
raise
NotImplementedError
(
"Stage does not supported: {}."
.
format
(
stage
))
dataloader
=
DataLoader
(
trainset
,
batch_size
,
shuffle
=
False
,
collate_fn
=
data_collator
,
pin_memory
=
True
)
criterion
=
torch
.
nn
.
CrossEntropyLoss
(
reduction
=
"none"
)
total_ppl
=
0
perplexities
=
[]
batch
:
Dict
[
str
,
"torch.Tensor"
]
with
torch
.
no_grad
():
for
batch
in
tqdm
(
dataloader
):
batch
=
batch
.
to
(
model
.
device
)
outputs
=
model
(
**
batch
)
shift_logits
:
"torch.Tensor"
=
outputs
[
"logits"
][...,
:
-
1
,
:]
shift_labels
:
"torch.Tensor"
=
batch
[
"labels"
][...,
1
:]
loss_mask
=
shift_labels
!=
IGNORE_INDEX
flatten_logits
=
shift_logits
.
contiguous
().
view
(
shift_labels
.
size
(
0
)
*
shift_labels
.
size
(
1
),
-
1
)
flatten_labels
=
shift_labels
.
contiguous
().
view
(
-
1
)
token_logps
:
"torch.Tensor"
=
criterion
(
flatten_logits
,
flatten_labels
)
token_logps
=
token_logps
.
contiguous
().
view
(
shift_logits
.
size
(
0
),
-
1
)
sentence_logps
=
(
token_logps
*
loss_mask
).
sum
(
-
1
)
/
loss_mask
.
sum
(
-
1
)
total_ppl
+=
sentence_logps
.
exp
().
sum
().
item
()
perplexities
.
extend
(
sentence_logps
.
exp
().
tolist
())
with
open
(
save_name
,
"w"
,
encoding
=
"utf-8"
)
as
f
:
json
.
dump
(
perplexities
,
f
,
indent
=
2
)
print
(
"Average perplexity is {:.2f}"
.
format
(
total_ppl
/
len
(
perplexities
)))
print
(
"Perplexities have been saved at {}."
.
format
(
save_name
))
if
__name__
==
"__main__"
:
fire
.
Fire
(
calculate_ppl
)
LLaMA-Factory/scripts/length_cdf.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 the LlamaFactory team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
collections
import
defaultdict
import
fire
from
tqdm
import
tqdm
from
llamafactory.data
import
get_dataset
,
get_template_and_fix_tokenizer
from
llamafactory.hparams
import
get_train_args
from
llamafactory.model
import
load_tokenizer
def
length_cdf
(
model_name_or_path
:
str
,
dataset
:
str
=
"alpaca_en_demo"
,
dataset_dir
:
str
=
"data"
,
template
:
str
=
"default"
,
interval
:
int
=
1000
,
):
r
"""
Calculates the distribution of the input lengths in the dataset.
Usage: python length_cdf.py --model_name_or_path path_to_model --dataset alpaca_en_demo --template default
"""
model_args
,
data_args
,
training_args
,
_
,
_
=
get_train_args
(
dict
(
stage
=
"sft"
,
model_name_or_path
=
model_name_or_path
,
dataset
=
dataset
,
dataset_dir
=
dataset_dir
,
template
=
template
,
cutoff_len
=
1_000_000
,
output_dir
=
"dummy_dir"
,
overwrite_cache
=
True
,
do_train
=
True
,
)
)
tokenizer_module
=
load_tokenizer
(
model_args
)
template
=
get_template_and_fix_tokenizer
(
tokenizer_module
[
"tokenizer"
],
data_args
)
trainset
=
get_dataset
(
template
,
model_args
,
data_args
,
training_args
,
"sft"
,
**
tokenizer_module
)[
"train_dataset"
]
total_num
=
len
(
trainset
)
length_dict
=
defaultdict
(
int
)
for
sample
in
tqdm
(
trainset
[
"input_ids"
]):
length_dict
[
len
(
sample
)
//
interval
*
interval
]
+=
1
length_tuples
=
list
(
length_dict
.
items
())
length_tuples
.
sort
()
count_accu
,
prob_accu
=
0
,
0
for
length
,
count
in
length_tuples
:
count_accu
+=
count
prob_accu
+=
count
/
total_num
*
100
print
(
"{:d} ({:.2f}%) samples have length < {}."
.
format
(
count_accu
,
prob_accu
,
length
+
interval
))
if
__name__
==
"__main__"
:
fire
.
Fire
(
length_cdf
)
LLaMA-Factory/scripts/llama_pro.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 Tencent Inc. and the LlamaFactory team.
#
# This code is inspired by the Tencent's LLaMA-Pro library.
# https://github.com/TencentARC/LLaMA-Pro/blob/main/scripts/block_expansion.py
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
json
import
os
from
collections
import
OrderedDict
from
typing
import
TYPE_CHECKING
import
fire
import
torch
from
safetensors.torch
import
save_file
from
tqdm
import
tqdm
from
transformers
import
AutoConfig
,
AutoModelForCausalLM
,
AutoTokenizer
from
transformers.modeling_utils
import
(
SAFE_WEIGHTS_INDEX_NAME
,
SAFE_WEIGHTS_NAME
,
WEIGHTS_INDEX_NAME
,
WEIGHTS_NAME
,
shard_checkpoint
,
)
if
TYPE_CHECKING
:
from
transformers
import
PretrainedConfig
,
PreTrainedModel
def
change_name
(
name
:
str
,
old_index
:
int
,
new_index
:
int
)
->
str
:
return
name
.
replace
(
".{:d}."
.
format
(
old_index
),
".{:d}."
.
format
(
new_index
))
def
block_expansion
(
model_name_or_path
:
str
,
output_dir
:
str
,
num_expand
:
int
,
shard_size
:
str
=
"2GB"
,
save_safetensors
:
bool
=
True
,
):
r
"""
Performs block expansion for LLaMA, Mistral, Qwen1.5 or Yi models.
Usage: python llama_pro.py --model_name_or_path meta-llama/Llama-2-7b-hf --output_dir llama2_pro --num_expand 8
"""
config
:
"PretrainedConfig"
=
AutoConfig
.
from_pretrained
(
model_name_or_path
)
num_layers
=
getattr
(
config
,
"num_hidden_layers"
)
setattr
(
config
,
"num_hidden_layers"
,
num_layers
+
num_expand
)
config
.
save_pretrained
(
output_dir
)
tokenizer
=
AutoTokenizer
.
from_pretrained
(
model_name_or_path
)
tokenizer
.
save_pretrained
(
output_dir
)
config
:
"PretrainedConfig"
=
AutoConfig
.
from_pretrained
(
model_name_or_path
)
# load the original one
if
save_safetensors
:
setattr
(
config
,
"tie_word_embeddings"
,
False
)
# safetensors does not allow shared weights
model
:
"PreTrainedModel"
=
AutoModelForCausalLM
.
from_pretrained
(
model_name_or_path
,
config
=
config
,
torch_dtype
=
"auto"
,
trust_remote_code
=
True
,
low_cpu_mem_usage
=
True
,
)
state_dict
=
model
.
state_dict
()
if
num_layers
%
num_expand
!=
0
:
raise
ValueError
(
"`num_layers` {} should be divisible by `num_expand` {}."
.
format
(
num_layers
,
num_expand
))
split
=
num_layers
//
num_expand
layer_cnt
=
0
output_state_dict
=
OrderedDict
()
for
i
in
range
(
num_layers
):
for
key
,
value
in
state_dict
.
items
():
if
".{:d}."
.
format
(
i
)
in
key
:
output_state_dict
[
change_name
(
key
,
i
,
layer_cnt
)]
=
value
print
(
"Add layer {} copied from layer {}"
.
format
(
layer_cnt
,
i
))
layer_cnt
+=
1
if
(
i
+
1
)
%
split
==
0
:
for
key
,
value
in
state_dict
.
items
():
if
".{:d}."
.
format
(
i
)
in
key
:
if
"down_proj"
in
key
or
"o_proj"
in
key
:
output_state_dict
[
change_name
(
key
,
i
,
layer_cnt
)]
=
torch
.
zeros_like
(
value
)
else
:
output_state_dict
[
change_name
(
key
,
i
,
layer_cnt
)]
=
torch
.
clone
(
value
)
print
(
"Add layer {} expanded from layer {}"
.
format
(
layer_cnt
,
i
))
layer_cnt
+=
1
for
key
,
value
in
state_dict
.
items
():
if
key
not
in
output_state_dict
:
output_state_dict
[
key
]
=
value
weights_name
=
SAFE_WEIGHTS_NAME
if
save_safetensors
else
WEIGHTS_NAME
shards
,
index
=
shard_checkpoint
(
output_state_dict
,
max_shard_size
=
shard_size
,
weights_name
=
weights_name
)
for
shard_file
,
shard
in
tqdm
(
shards
.
items
(),
desc
=
"Save weights"
):
if
save_safetensors
:
save_file
(
shard
,
os
.
path
.
join
(
output_dir
,
shard_file
),
metadata
=
{
"format"
:
"pt"
})
else
:
torch
.
save
(
shard
,
os
.
path
.
join
(
output_dir
,
shard_file
))
if
index
is
None
:
print
(
"Model weights saved in {}"
.
format
(
os
.
path
.
join
(
output_dir
,
weights_name
)))
else
:
index_name
=
SAFE_WEIGHTS_INDEX_NAME
if
save_safetensors
else
WEIGHTS_INDEX_NAME
with
open
(
os
.
path
.
join
(
output_dir
,
index_name
),
"w"
,
encoding
=
"utf-8"
)
as
f
:
json
.
dump
(
index
,
f
,
indent
=
2
,
sort_keys
=
True
)
print
(
"Model weights saved in {}"
.
format
(
output_dir
))
print
(
"- Fine-tune this model with:"
)
print
(
"model_name_or_path: {}"
.
format
(
output_dir
))
print
(
"finetuning_type: freeze"
)
print
(
"freeze_trainable_layers: {}"
.
format
(
num_expand
))
print
(
"use_llama_pro: true"
)
if
__name__
==
"__main__"
:
fire
.
Fire
(
block_expansion
)
LLaMA-Factory/scripts/llamafy_baichuan2.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 the LlamaFactory team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
json
import
os
from
collections
import
OrderedDict
from
typing
import
Any
,
Dict
import
fire
import
torch
from
safetensors.torch
import
save_file
from
tqdm
import
tqdm
from
transformers.modeling_utils
import
(
SAFE_WEIGHTS_INDEX_NAME
,
SAFE_WEIGHTS_NAME
,
WEIGHTS_INDEX_NAME
,
WEIGHTS_NAME
,
shard_checkpoint
,
)
CONFIG_NAME
=
"config.json"
def
save_weight
(
input_dir
:
str
,
output_dir
:
str
,
shard_size
:
str
,
save_safetensors
:
bool
):
baichuan2_state_dict
:
Dict
[
str
,
torch
.
Tensor
]
=
OrderedDict
()
for
filepath
in
tqdm
(
os
.
listdir
(
input_dir
),
desc
=
"Load weights"
):
if
os
.
path
.
isfile
(
os
.
path
.
join
(
input_dir
,
filepath
))
and
filepath
.
endswith
(
".bin"
):
shard_weight
=
torch
.
load
(
os
.
path
.
join
(
input_dir
,
filepath
),
map_location
=
"cpu"
)
baichuan2_state_dict
.
update
(
shard_weight
)
llama2_state_dict
:
Dict
[
str
,
torch
.
Tensor
]
=
OrderedDict
()
for
key
,
value
in
tqdm
(
baichuan2_state_dict
.
items
(),
desc
=
"Convert format"
):
if
"W_pack"
in
key
:
proj_size
=
value
.
size
(
0
)
//
3
llama2_state_dict
[
key
.
replace
(
"W_pack"
,
"q_proj"
)]
=
value
[:
proj_size
,
:]
llama2_state_dict
[
key
.
replace
(
"W_pack"
,
"k_proj"
)]
=
value
[
proj_size
:
2
*
proj_size
,
:]
llama2_state_dict
[
key
.
replace
(
"W_pack"
,
"v_proj"
)]
=
value
[
2
*
proj_size
:,
:]
elif
"lm_head"
in
key
:
llama2_state_dict
[
key
]
=
torch
.
nn
.
functional
.
normalize
(
value
)
else
:
llama2_state_dict
[
key
]
=
value
weights_name
=
SAFE_WEIGHTS_NAME
if
save_safetensors
else
WEIGHTS_NAME
shards
,
index
=
shard_checkpoint
(
llama2_state_dict
,
max_shard_size
=
shard_size
,
weights_name
=
weights_name
)
for
shard_file
,
shard
in
tqdm
(
shards
.
items
(),
desc
=
"Save weights"
):
if
save_safetensors
:
save_file
(
shard
,
os
.
path
.
join
(
output_dir
,
shard_file
),
metadata
=
{
"format"
:
"pt"
})
else
:
torch
.
save
(
shard
,
os
.
path
.
join
(
output_dir
,
shard_file
))
if
index
is
None
:
print
(
"Model weights saved in {}"
.
format
(
os
.
path
.
join
(
output_dir
,
WEIGHTS_NAME
)))
else
:
index_name
=
SAFE_WEIGHTS_INDEX_NAME
if
save_safetensors
else
WEIGHTS_INDEX_NAME
with
open
(
os
.
path
.
join
(
output_dir
,
index_name
),
"w"
,
encoding
=
"utf-8"
)
as
f
:
json
.
dump
(
index
,
f
,
indent
=
2
,
sort_keys
=
True
)
print
(
"Model weights saved in {}"
.
format
(
output_dir
))
def
save_config
(
input_dir
:
str
,
output_dir
:
str
):
with
open
(
os
.
path
.
join
(
input_dir
,
CONFIG_NAME
),
"r"
,
encoding
=
"utf-8"
)
as
f
:
llama2_config_dict
:
Dict
[
str
,
Any
]
=
json
.
load
(
f
)
llama2_config_dict
[
"architectures"
]
=
[
"LlamaForCausalLM"
]
llama2_config_dict
.
pop
(
"auto_map"
,
None
)
llama2_config_dict
.
pop
(
"tokenizer_class"
,
None
)
llama2_config_dict
[
"model_type"
]
=
"llama"
with
open
(
os
.
path
.
join
(
output_dir
,
CONFIG_NAME
),
"w"
,
encoding
=
"utf-8"
)
as
f
:
json
.
dump
(
llama2_config_dict
,
f
,
indent
=
2
)
print
(
"Model config saved in {}"
.
format
(
os
.
path
.
join
(
output_dir
,
CONFIG_NAME
)))
def
llamafy_baichuan2
(
input_dir
:
str
,
output_dir
:
str
,
shard_size
:
str
=
"2GB"
,
save_safetensors
:
bool
=
True
,
):
r
"""
Converts the Baichuan2-7B model in the same format as LLaMA2-7B.
Usage: python llamafy_baichuan2.py --input_dir input --output_dir output
Converted model: https://huggingface.co/hiyouga/Baichuan2-7B-Base-LLaMAfied
"""
try
:
os
.
makedirs
(
output_dir
,
exist_ok
=
False
)
except
Exception
as
e
:
raise
print
(
"Output dir already exists"
,
e
)
save_weight
(
input_dir
,
output_dir
,
shard_size
,
save_safetensors
)
save_config
(
input_dir
,
output_dir
)
if
__name__
==
"__main__"
:
fire
.
Fire
(
llamafy_baichuan2
)
LLaMA-Factory/scripts/llamafy_qwen.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 the LlamaFactory team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
json
import
os
from
collections
import
OrderedDict
from
typing
import
Any
,
Dict
import
fire
import
torch
from
safetensors
import
safe_open
from
safetensors.torch
import
save_file
from
tqdm
import
tqdm
from
transformers.modeling_utils
import
(
SAFE_WEIGHTS_INDEX_NAME
,
SAFE_WEIGHTS_NAME
,
WEIGHTS_INDEX_NAME
,
WEIGHTS_NAME
,
shard_checkpoint
,
)
from
transformers.utils
import
check_min_version
try
:
check_min_version
(
"4.34.0"
)
except
Exception
:
raise
ValueError
(
"Please upgrade `transformers` to 4.34.0"
)
CONFIG_NAME
=
"config.json"
def
save_weight
(
input_dir
:
str
,
output_dir
:
str
,
shard_size
:
str
,
save_safetensors
:
bool
)
->
str
:
qwen_state_dict
:
Dict
[
str
,
torch
.
Tensor
]
=
OrderedDict
()
for
filepath
in
tqdm
(
os
.
listdir
(
input_dir
),
desc
=
"Load weights"
):
if
os
.
path
.
isfile
(
os
.
path
.
join
(
input_dir
,
filepath
))
and
filepath
.
endswith
(
".safetensors"
):
with
safe_open
(
os
.
path
.
join
(
input_dir
,
filepath
),
framework
=
"pt"
,
device
=
"cpu"
)
as
f
:
for
key
in
f
.
keys
():
qwen_state_dict
[
key
]
=
f
.
get_tensor
(
key
)
llama2_state_dict
:
Dict
[
str
,
torch
.
Tensor
]
=
OrderedDict
()
torch_dtype
=
None
for
key
,
value
in
tqdm
(
qwen_state_dict
.
items
(),
desc
=
"Convert format"
):
if
torch_dtype
is
None
:
torch_dtype
=
value
.
dtype
if
"wte"
in
key
:
llama2_state_dict
[
"model.embed_tokens.weight"
]
=
value
elif
"ln_f"
in
key
:
llama2_state_dict
[
"model.norm.weight"
]
=
value
else
:
key
=
key
.
replace
(
"transformer.h"
,
"model.layers"
)
if
"attn.c_attn"
in
key
:
proj_size
=
value
.
size
(
0
)
//
3
llama2_state_dict
[
key
.
replace
(
"attn.c_attn"
,
"self_attn.q_proj"
)]
=
value
[:
proj_size
,
...]
llama2_state_dict
[
key
.
replace
(
"attn.c_attn"
,
"self_attn.k_proj"
)]
=
value
[
proj_size
:
2
*
proj_size
,
...
]
llama2_state_dict
[
key
.
replace
(
"attn.c_attn"
,
"self_attn.v_proj"
)]
=
value
[
2
*
proj_size
:,
...]
elif
"attn.c_proj"
in
key
:
llama2_state_dict
[
key
.
replace
(
"attn.c_proj"
,
"self_attn.o_proj"
)]
=
value
llama2_state_dict
[
key
.
replace
(
"attn.c_proj.weight"
,
"self_attn.o_proj.bias"
)]
=
torch
.
zeros_like
(
value
[:,
0
]
).
squeeze
()
elif
"ln_1"
in
key
:
llama2_state_dict
[
key
.
replace
(
"ln_1"
,
"input_layernorm"
)]
=
value
elif
"ln_2"
in
key
:
llama2_state_dict
[
key
.
replace
(
"ln_2"
,
"post_attention_layernorm"
)]
=
value
elif
"mlp.w1"
in
key
:
llama2_state_dict
[
key
.
replace
(
"mlp.w1"
,
"mlp.up_proj"
)]
=
value
elif
"mlp.w2"
in
key
:
llama2_state_dict
[
key
.
replace
(
"mlp.w2"
,
"mlp.gate_proj"
)]
=
value
elif
"mlp.c_proj"
in
key
:
llama2_state_dict
[
key
.
replace
(
"mlp.c_proj"
,
"mlp.down_proj"
)]
=
value
elif
"lm_head"
in
key
:
llama2_state_dict
[
key
]
=
value
else
:
raise
KeyError
(
"Unable to process key {}"
.
format
(
key
))
weights_name
=
SAFE_WEIGHTS_NAME
if
save_safetensors
else
WEIGHTS_NAME
shards
,
index
=
shard_checkpoint
(
llama2_state_dict
,
max_shard_size
=
shard_size
,
weights_name
=
weights_name
)
for
shard_file
,
shard
in
tqdm
(
shards
.
items
(),
desc
=
"Save weights"
):
if
save_safetensors
:
save_file
(
shard
,
os
.
path
.
join
(
output_dir
,
shard_file
),
metadata
=
{
"format"
:
"pt"
})
else
:
torch
.
save
(
shard
,
os
.
path
.
join
(
output_dir
,
shard_file
))
if
index
is
None
:
print
(
"Model weights saved in {}"
.
format
(
os
.
path
.
join
(
output_dir
,
weights_name
)))
else
:
index_name
=
SAFE_WEIGHTS_INDEX_NAME
if
save_safetensors
else
WEIGHTS_INDEX_NAME
with
open
(
os
.
path
.
join
(
output_dir
,
index_name
),
"w"
,
encoding
=
"utf-8"
)
as
f
:
json
.
dump
(
index
,
f
,
indent
=
2
,
sort_keys
=
True
)
print
(
"Model weights saved in {}"
.
format
(
output_dir
))
return
str
(
torch_dtype
).
replace
(
"torch."
,
""
)
def
save_config
(
input_dir
:
str
,
output_dir
:
str
,
torch_dtype
:
str
):
with
open
(
os
.
path
.
join
(
input_dir
,
CONFIG_NAME
),
"r"
,
encoding
=
"utf-8"
)
as
f
:
qwen_config_dict
:
Dict
[
str
,
Any
]
=
json
.
load
(
f
)
llama2_config_dict
:
Dict
[
str
,
Any
]
=
OrderedDict
()
llama2_config_dict
[
"architectures"
]
=
[
"LlamaForCausalLM"
]
llama2_config_dict
[
"hidden_act"
]
=
"silu"
llama2_config_dict
[
"hidden_size"
]
=
qwen_config_dict
[
"hidden_size"
]
llama2_config_dict
[
"initializer_range"
]
=
qwen_config_dict
[
"initializer_range"
]
llama2_config_dict
[
"intermediate_size"
]
=
qwen_config_dict
[
"intermediate_size"
]
//
2
llama2_config_dict
[
"max_position_embeddings"
]
=
qwen_config_dict
[
"max_position_embeddings"
]
llama2_config_dict
[
"model_type"
]
=
"llama"
llama2_config_dict
[
"num_attention_heads"
]
=
qwen_config_dict
[
"num_attention_heads"
]
llama2_config_dict
[
"num_hidden_layers"
]
=
qwen_config_dict
[
"num_hidden_layers"
]
llama2_config_dict
[
"num_key_value_heads"
]
=
qwen_config_dict
[
"hidden_size"
]
//
qwen_config_dict
[
"kv_channels"
]
llama2_config_dict
[
"pretraining_tp"
]
=
1
llama2_config_dict
[
"rms_norm_eps"
]
=
qwen_config_dict
[
"layer_norm_epsilon"
]
llama2_config_dict
[
"rope_scaling"
]
=
None
llama2_config_dict
[
"tie_word_embeddings"
]
=
qwen_config_dict
[
"tie_word_embeddings"
]
llama2_config_dict
[
"torch_dtype"
]
=
torch_dtype
llama2_config_dict
[
"transformers_version"
]
=
"4.34.0"
llama2_config_dict
[
"use_cache"
]
=
True
llama2_config_dict
[
"vocab_size"
]
=
qwen_config_dict
[
"vocab_size"
]
llama2_config_dict
[
"attention_bias"
]
=
True
with
open
(
os
.
path
.
join
(
output_dir
,
CONFIG_NAME
),
"w"
,
encoding
=
"utf-8"
)
as
f
:
json
.
dump
(
llama2_config_dict
,
f
,
indent
=
2
)
print
(
"Model config saved in {}"
.
format
(
os
.
path
.
join
(
output_dir
,
CONFIG_NAME
)))
def
llamafy_qwen
(
input_dir
:
str
,
output_dir
:
str
,
shard_size
:
str
=
"2GB"
,
save_safetensors
:
bool
=
False
,
):
r
"""
Converts the Qwen models in the same format as LLaMA2.
Usage: python llamafy_qwen.py --input_dir input --output_dir output
Converted model: https://huggingface.co/hiyouga/Qwen-14B-Chat-LLaMAfied
"""
try
:
os
.
makedirs
(
output_dir
,
exist_ok
=
False
)
except
Exception
as
e
:
raise
print
(
"Output dir already exists"
,
e
)
torch_dtype
=
save_weight
(
input_dir
,
output_dir
,
shard_size
,
save_safetensors
)
save_config
(
input_dir
,
output_dir
,
torch_dtype
)
if
__name__
==
"__main__"
:
fire
.
Fire
(
llamafy_qwen
)
LLaMA-Factory/scripts/loftq_init.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 HuggingFace Inc. and the LlamaFactory team.
#
# This code is based on the HuggingFace's PEFT library.
# https://github.com/huggingface/peft/blob/v0.10.0/examples/loftq_finetuning/quantize_save_load.py
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
os
from
typing
import
TYPE_CHECKING
import
fire
from
peft
import
LoftQConfig
,
LoraConfig
,
TaskType
,
get_peft_model
from
transformers
import
AutoModelForCausalLM
,
AutoTokenizer
if
TYPE_CHECKING
:
from
transformers
import
PreTrainedModel
def
quantize_loftq
(
model_name_or_path
:
str
,
output_dir
:
str
,
loftq_bits
:
int
=
4
,
loftq_iter
:
int
=
4
,
lora_alpha
:
int
=
None
,
lora_rank
:
int
=
16
,
lora_dropout
:
float
=
0
,
lora_target
:
tuple
=
(
"q_proj"
,
"v_proj"
),
save_safetensors
:
bool
=
True
,
):
r
"""
Initializes LoRA weights with LoRA-fine-tuning-aware Quantization (LoftQ)
Usage: python loftq_init.py --model_name_or_path path_to_model --output_dir output_dir
"""
if
isinstance
(
lora_target
,
str
):
lora_target
=
[
name
.
strip
()
for
name
in
lora_target
.
split
(
","
)]
tokenizer
=
AutoTokenizer
.
from_pretrained
(
model_name_or_path
,
trust_remote_code
=
True
)
model
=
AutoModelForCausalLM
.
from_pretrained
(
model_name_or_path
,
trust_remote_code
=
True
,
torch_dtype
=
"auto"
)
loftq_config
=
LoftQConfig
(
loftq_bits
=
loftq_bits
,
loftq_iter
=
loftq_iter
)
lora_config
=
LoraConfig
(
task_type
=
TaskType
.
CAUSAL_LM
,
inference_mode
=
True
,
r
=
lora_rank
,
lora_alpha
=
lora_alpha
if
lora_alpha
is
not
None
else
lora_rank
*
2
,
lora_dropout
=
lora_dropout
,
target_modules
=
lora_target
,
init_lora_weights
=
"loftq"
,
loftq_config
=
loftq_config
,
)
# Init LoftQ model
print
(
"Initializing LoftQ weights, it may be take several minutes, wait patiently."
)
peft_model
=
get_peft_model
(
model
,
lora_config
)
loftq_dir
=
os
.
path
.
join
(
output_dir
,
"loftq_init"
)
# Save LoftQ model
setattr
(
peft_model
.
peft_config
[
"default"
],
"base_model_name_or_path"
,
os
.
path
.
abspath
(
output_dir
))
setattr
(
peft_model
.
peft_config
[
"default"
],
"init_lora_weights"
,
True
)
# don't apply loftq again
peft_model
.
save_pretrained
(
loftq_dir
,
safe_serialization
=
save_safetensors
)
print
(
"Adapter weights saved in {}"
.
format
(
loftq_dir
))
# Save base model
base_model
:
"PreTrainedModel"
=
peft_model
.
unload
()
base_model
.
save_pretrained
(
output_dir
,
safe_serialization
=
save_safetensors
)
tokenizer
.
save_pretrained
(
output_dir
)
print
(
"Model weights saved in {}"
.
format
(
output_dir
))
print
(
"- Fine-tune this model with:"
)
print
(
"model_name_or_path: {}"
.
format
(
output_dir
))
print
(
"adapter_name_or_path: {}"
.
format
(
loftq_dir
))
print
(
"finetuning_type: lora"
)
print
(
"quantization_bit: {}"
.
format
(
loftq_bits
))
if
__name__
==
"__main__"
:
fire
.
Fire
(
quantize_loftq
)
LLaMA-Factory/scripts/pissa_init.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 HuggingFace Inc. and the LlamaFactory team.
#
# This code is based on the HuggingFace's PEFT library.
# https://github.com/huggingface/peft/blob/v0.11.0/examples/pissa_finetuning/preprocess.py
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
os
from
typing
import
TYPE_CHECKING
import
fire
from
peft
import
LoraConfig
,
TaskType
,
get_peft_model
from
transformers
import
AutoModelForCausalLM
,
AutoTokenizer
if
TYPE_CHECKING
:
from
transformers
import
PreTrainedModel
def
quantize_pissa
(
model_name_or_path
:
str
,
output_dir
:
str
,
pissa_iter
:
int
=
16
,
lora_alpha
:
int
=
None
,
lora_rank
:
int
=
16
,
lora_dropout
:
float
=
0
,
lora_target
:
tuple
=
(
"q_proj"
,
"v_proj"
),
save_safetensors
:
bool
=
True
,
):
r
"""
Initializes LoRA weights with Principal Singular values and Singular vectors Adaptation (PiSSA)
Usage: python pissa_init.py --model_name_or_path path_to_model --output_dir output_dir
"""
if
isinstance
(
lora_target
,
str
):
lora_target
=
[
name
.
strip
()
for
name
in
lora_target
.
split
(
","
)]
tokenizer
=
AutoTokenizer
.
from_pretrained
(
model_name_or_path
,
trust_remote_code
=
True
)
model
=
AutoModelForCausalLM
.
from_pretrained
(
model_name_or_path
,
trust_remote_code
=
True
,
torch_dtype
=
"auto"
)
lora_config
=
LoraConfig
(
task_type
=
TaskType
.
CAUSAL_LM
,
r
=
lora_rank
,
lora_alpha
=
lora_alpha
if
lora_alpha
is
not
None
else
lora_rank
*
2
,
lora_dropout
=
lora_dropout
,
target_modules
=
lora_target
,
init_lora_weights
=
"pissa"
if
pissa_iter
==
-
1
else
"pissa_niter_{}"
.
format
(
pissa_iter
),
)
# Init PiSSA model
peft_model
=
get_peft_model
(
model
,
lora_config
)
pissa_dir
=
os
.
path
.
join
(
output_dir
,
"pissa_init"
)
# Save PiSSA model
setattr
(
peft_model
.
peft_config
[
"default"
],
"base_model_name_or_path"
,
os
.
path
.
abspath
(
output_dir
))
setattr
(
peft_model
.
peft_config
[
"default"
],
"init_lora_weights"
,
True
)
# don't apply pissa again
peft_model
.
save_pretrained
(
pissa_dir
,
safe_serialization
=
save_safetensors
)
print
(
"Adapter weights saved in {}"
.
format
(
pissa_dir
))
# Save base model
base_model
:
"PreTrainedModel"
=
peft_model
.
unload
()
base_model
.
save_pretrained
(
output_dir
,
safe_serialization
=
save_safetensors
)
tokenizer
.
save_pretrained
(
output_dir
)
print
(
"Model weights saved in {}"
.
format
(
output_dir
))
print
(
"- Fine-tune this model with:"
)
print
(
"model_name_or_path: {}"
.
format
(
output_dir
))
print
(
"adapter_name_or_path: {}"
.
format
(
pissa_dir
))
print
(
"finetuning_type: lora"
)
print
(
"pissa_init: false"
)
print
(
"pissa_convert: true"
)
print
(
"- and optionally with:"
)
print
(
"quantization_bit: 4"
)
if
__name__
==
"__main__"
:
fire
.
Fire
(
quantize_pissa
)
LLaMA-Factory/scripts/test_toolcall.py
0 → 100644
View file @
802ef8b7
# coding=utf-8
# Copyright 2024 the LlamaFactory team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
json
import
os
from
typing
import
Sequence
from
openai
import
OpenAI
from
transformers.utils.versions
import
require_version
require_version
(
"openai>=1.5.0"
,
"To fix: pip install openai>=1.5.0"
)
def
calculate_gpa
(
grades
:
Sequence
[
str
],
hours
:
Sequence
[
int
])
->
float
:
grade_to_score
=
{
"A"
:
4
,
"B"
:
3
,
"C"
:
2
}
total_score
,
total_hour
=
0
,
0
for
grade
,
hour
in
zip
(
grades
,
hours
):
total_score
+=
grade_to_score
[
grade
]
*
hour
total_hour
+=
hour
return
round
(
total_score
/
total_hour
,
2
)
def
main
():
client
=
OpenAI
(
api_key
=
"{}"
.
format
(
os
.
environ
.
get
(
"API_KEY"
,
"0"
)),
base_url
=
"http://localhost:{}/v1"
.
format
(
os
.
environ
.
get
(
"API_PORT"
,
8000
)),
)
tools
=
[
{
"type"
:
"function"
,
"function"
:
{
"name"
:
"calculate_gpa"
,
"description"
:
"Calculate the Grade Point Average (GPA) based on grades and credit hours"
,
"parameters"
:
{
"type"
:
"object"
,
"properties"
:
{
"grades"
:
{
"type"
:
"array"
,
"items"
:
{
"type"
:
"string"
},
"description"
:
"The grades"
},
"hours"
:
{
"type"
:
"array"
,
"items"
:
{
"type"
:
"integer"
},
"description"
:
"The credit hours"
},
},
"required"
:
[
"grades"
,
"hours"
],
},
},
}
]
tool_map
=
{
"calculate_gpa"
:
calculate_gpa
}
messages
=
[]
messages
.
append
({
"role"
:
"user"
,
"content"
:
"My grades are A, A, B, and C. The credit hours are 3, 4, 3, and 2."
})
result
=
client
.
chat
.
completions
.
create
(
messages
=
messages
,
model
=
"test"
,
tools
=
tools
)
if
result
.
choices
[
0
].
message
.
tool_calls
is
None
:
raise
ValueError
(
"Cannot retrieve function call from the response."
)
messages
.
append
(
result
.
choices
[
0
].
message
)
tool_call
=
result
.
choices
[
0
].
message
.
tool_calls
[
0
].
function
print
(
tool_call
)
# Function(arguments='{"grades": ["A", "A", "B", "C"], "hours": [3, 4, 3, 2]}', name='calculate_gpa')
name
,
arguments
=
tool_call
.
name
,
json
.
loads
(
tool_call
.
arguments
)
tool_result
=
tool_map
[
name
](
**
arguments
)
messages
.
append
({
"role"
:
"tool"
,
"content"
:
json
.
dumps
({
"gpa"
:
tool_result
},
ensure_ascii
=
False
)})
result
=
client
.
chat
.
completions
.
create
(
messages
=
messages
,
model
=
"test"
,
tools
=
tools
)
print
(
result
.
choices
[
0
].
message
.
content
)
# Based on the grades and credit hours you provided, your Grade Point Average (GPA) is 3.42.
if
__name__
==
"__main__"
:
main
()
LLaMA-Factory/setup.py
0 → 100644
View file @
802ef8b7
# Copyright 2024 the LlamaFactory team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
os
import
re
from
typing
import
List
from
setuptools
import
find_packages
,
setup
def
get_version
()
->
str
:
with
open
(
os
.
path
.
join
(
"src"
,
"llamafactory"
,
"extras"
,
"env.py"
),
"r"
,
encoding
=
"utf-8"
)
as
f
:
file_content
=
f
.
read
()
pattern
=
r
"{}\W*=\W*\"([^\"]+)\""
.
format
(
"VERSION"
)
(
version
,)
=
re
.
findall
(
pattern
,
file_content
)
return
version
def
get_requires
()
->
List
[
str
]:
with
open
(
"requirements.txt"
,
"r"
,
encoding
=
"utf-8"
)
as
f
:
file_content
=
f
.
read
()
lines
=
[
line
.
strip
()
for
line
in
file_content
.
strip
().
split
(
"
\n
"
)
if
not
line
.
startswith
(
"#"
)]
return
lines
def
get_console_scripts
()
->
List
[
str
]:
console_scripts
=
[
"llamafactory-cli = llamafactory.cli:main"
]
if
os
.
environ
.
get
(
"ENABLE_SHORT_CONSOLE"
,
"1"
).
lower
()
in
[
"true"
,
"1"
]:
console_scripts
.
append
(
"lmf = llamafactory.cli:main"
)
return
console_scripts
extra_require
=
{
"torch"
:
[
"torch>=1.13.1"
],
"torch-npu"
:
[
"torch==2.1.0"
,
"torch-npu==2.1.0.post3"
,
"decorator"
],
"metrics"
:
[
"nltk"
,
"jieba"
,
"rouge-chinese"
],
"deepspeed"
:
[
"deepspeed>=0.10.0,<=0.14.4"
],
"liger-kernel"
:
[
"liger-kernel"
],
"bitsandbytes"
:
[
"bitsandbytes>=0.39.0"
],
"hqq"
:
[
"hqq"
],
"eetq"
:
[
"eetq"
],
"gptq"
:
[
"optimum>=1.17.0"
,
"auto-gptq>=0.5.0"
],
"awq"
:
[
"autoawq"
],
"aqlm"
:
[
"aqlm[gpu]>=1.1.0"
],
"vllm"
:
[
"vllm>=0.4.3,<=0.6.2"
],
"galore"
:
[
"galore-torch"
],
"badam"
:
[
"badam>=1.2.1"
],
"adam-mini"
:
[
"adam-mini"
],
"qwen"
:
[
"transformers_stream_generator"
],
"modelscope"
:
[
"modelscope"
],
"dev"
:
[
"ruff"
,
"pytest"
],
}
def
main
():
setup
(
name
=
"llamafactory"
,
version
=
get_version
(),
author
=
"hiyouga"
,
author_email
=
"hiyouga"
"@"
"buaa.edu.cn"
,
description
=
"Easy-to-use LLM fine-tuning framework"
,
long_description
=
open
(
"README.md"
,
"r"
,
encoding
=
"utf-8"
).
read
(),
long_description_content_type
=
"text/markdown"
,
keywords
=
[
"LLaMA"
,
"BLOOM"
,
"Falcon"
,
"LLM"
,
"ChatGPT"
,
"transformer"
,
"pytorch"
,
"deep learning"
],
license
=
"Apache 2.0 License"
,
url
=
"https://github.com/hiyouga/LLaMA-Factory"
,
package_dir
=
{
""
:
"src"
},
packages
=
find_packages
(
"src"
),
python_requires
=
">=3.8.0"
,
install_requires
=
get_requires
(),
extras_require
=
extra_require
,
entry_points
=
{
"console_scripts"
:
get_console_scripts
()},
classifiers
=
[
"Development Status :: 4 - Beta"
,
"Intended Audience :: Developers"
,
"Intended Audience :: Education"
,
"Intended Audience :: Science/Research"
,
"License :: OSI Approved :: Apache Software License"
,
"Operating System :: OS Independent"
,
"Programming Language :: Python :: 3"
,
"Programming Language :: Python :: 3.8"
,
"Programming Language :: Python :: 3.9"
,
"Programming Language :: Python :: 3.10"
,
"Programming Language :: Python :: 3.11"
,
"Topic :: Scientific/Engineering :: Artificial Intelligence"
,
],
)
if
__name__
==
"__main__"
:
main
()
LLaMA-Factory/src/api.py
0 → 100644
View file @
802ef8b7
# Copyright 2024 the LlamaFactory team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
os
import
uvicorn
from
llamafactory.api.app
import
create_app
from
llamafactory.chat
import
ChatModel
def
main
():
chat_model
=
ChatModel
()
app
=
create_app
(
chat_model
)
api_host
=
os
.
environ
.
get
(
"API_HOST"
,
"0.0.0.0"
)
api_port
=
int
(
os
.
environ
.
get
(
"API_PORT"
,
"8000"
))
print
(
"Visit http://localhost:{}/docs for API document."
.
format
(
api_port
))
uvicorn
.
run
(
app
,
host
=
api_host
,
port
=
api_port
)
if
__name__
==
"__main__"
:
main
()
Prev
1
2
3
4
5
6
7
8
9
10
…
14
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment