Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
DISC-FinLLM_pytorch
Commits
afe180a6
Commit
afe180a6
authored
May 21, 2024
by
wanglch
Browse files
Initial commit
parents
Pipeline
#1006
canceled with stages
Changes
258
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
616 additions
and
0 deletions
+616
-0
LLaMA-Factory/examples/README.md
LLaMA-Factory/examples/README.md
+50
-0
LLaMA-Factory/examples/README_zh.md
LLaMA-Factory/examples/README_zh.md
+50
-0
LLaMA-Factory/examples/accelerate/fsdp_config.yaml
LLaMA-Factory/examples/accelerate/fsdp_config.yaml
+25
-0
LLaMA-Factory/examples/accelerate/master_config.yaml
LLaMA-Factory/examples/accelerate/master_config.yaml
+18
-0
LLaMA-Factory/examples/accelerate/single_config.yaml
LLaMA-Factory/examples/accelerate/single_config.yaml
+16
-0
LLaMA-Factory/examples/accelerate/slave_config.yaml
LLaMA-Factory/examples/accelerate/slave_config.yaml
+18
-0
LLaMA-Factory/examples/deepspeed/ds_z2_config.json
LLaMA-Factory/examples/deepspeed/ds_z2_config.json
+29
-0
LLaMA-Factory/examples/deepspeed/ds_z2_offload_config.json
LLaMA-Factory/examples/deepspeed/ds_z2_offload_config.json
+33
-0
LLaMA-Factory/examples/deepspeed/ds_z3_config.json
LLaMA-Factory/examples/deepspeed/ds_z3_config.json
+30
-0
LLaMA-Factory/examples/deepspeed/ds_z3_offload_config.json
LLaMA-Factory/examples/deepspeed/ds_z3_offload_config.json
+39
-0
LLaMA-Factory/examples/extras/badam/sft.sh
LLaMA-Factory/examples/extras/badam/sft.sh
+35
-0
LLaMA-Factory/examples/extras/fsdp_qlora/sft.sh
LLaMA-Factory/examples/extras/fsdp_qlora/sft.sh
+41
-0
LLaMA-Factory/examples/extras/galore/sft.sh
LLaMA-Factory/examples/extras/galore/sft.sh
+36
-0
LLaMA-Factory/examples/extras/llama_pro/expand.sh
LLaMA-Factory/examples/extras/llama_pro/expand.sh
+6
-0
LLaMA-Factory/examples/extras/llama_pro/sft.sh
LLaMA-Factory/examples/extras/llama_pro/sft.sh
+34
-0
LLaMA-Factory/examples/extras/loraplus/sft.sh
LLaMA-Factory/examples/extras/loraplus/sft.sh
+33
-0
LLaMA-Factory/examples/extras/mod/sft.sh
LLaMA-Factory/examples/extras/mod/sft.sh
+33
-0
LLaMA-Factory/examples/full_multi_gpu/multi_node.sh
LLaMA-Factory/examples/full_multi_gpu/multi_node.sh
+38
-0
LLaMA-Factory/examples/full_multi_gpu/predict.sh
LLaMA-Factory/examples/full_multi_gpu/predict.sh
+20
-0
LLaMA-Factory/examples/full_multi_gpu/single_node.sh
LLaMA-Factory/examples/full_multi_gpu/single_node.sh
+32
-0
No files found.
LLaMA-Factory/examples/README.md
0 → 100644
View file @
afe180a6
We provide diverse examples about fine-tuning LLMs.
```
examples/
├── lora_single_gpu/
│ ├── pretrain.sh: Do continuous pre-training using LoRA
│ ├── sft.sh: Do supervised fine-tuning using LoRA
│ ├── reward.sh: Do reward modeling using LoRA
│ ├── ppo.sh: Do PPO training using LoRA
│ ├── dpo.sh: Do DPO training using LoRA
│ ├── orpo.sh: Do ORPO training using LoRA
│ ├── sft_mllm.sh: Do supervised fine-tuning on multimodal data using LoRA
│ ├── prepare.sh: Save tokenized dataset
│ └── predict.sh: Do batch predict and compute BLEU and ROUGE scores after LoRA tuning
├── qlora_single_gpu/
│ ├── bitsandbytes.sh: Fine-tune 4/8-bit BNB models using QLoRA
│ ├── gptq.sh: Fine-tune 4/8-bit GPTQ models using QLoRA
│ ├── awq.sh: Fine-tune 4-bit AWQ models using QLoRA
│ └── aqlm.sh: Fine-tune 2-bit AQLM models using QLoRA
├── lora_multi_gpu/
│ ├── single_node.sh: Fine-tune model with Accelerate on single node using LoRA
│ ├── multi_node.sh: Fine-tune model with Accelerate on multiple nodes using LoRA
│ └── ds_zero3.sh: Fine-tune model with DeepSpeed ZeRO-3 using LoRA (weight sharding)
├── full_multi_gpu/
│ ├── single_node.sh: Full fine-tune model with DeepSpeed on single node
│ ├── multi_node.sh: Full fine-tune model with DeepSpeed on multiple nodes
│ └── predict.sh: Do parallel batch predict and compute BLEU and ROUGE scores after full tuning
├── merge_lora/
│ ├── merge.sh: Merge LoRA weights into the pre-trained models
│ └── quantize.sh: Quantize the fine-tuned model with AutoGPTQ
├── inference/
│ ├── cli_demo.sh: Chat with fine-tuned model in the CLI with LoRA adapters
│ ├── api_demo.sh: Chat with fine-tuned model in an OpenAI-style API with LoRA adapters
│ ├── web_demo.sh: Chat with fine-tuned model in the Web browser with LoRA adapters
│ └── evaluate.sh: Evaluate model on the MMLU/CMMLU/C-Eval benchmarks with LoRA adapters
└── extras/
├── galore/
│ └── sft.sh: Fine-tune model with GaLore
├── badam/
│ └── sft.sh: Fine-tune model with BAdam
├── loraplus/
│ └── sft.sh: Fine-tune model using LoRA+
├── mod/
│ └── sft.sh: Fine-tune model using Mixture-of-Depths
├── llama_pro/
│ ├── expand.sh: Expand layers in the model
│ └── sft.sh: Fine-tune the expanded model
└── fsdp_qlora/
└── sft.sh: Fine-tune quantized model with FSDP+QLoRA
```
LLaMA-Factory/examples/README_zh.md
0 → 100644
View file @
afe180a6
我们提供了多样化的大模型微调示例脚本。
```
examples/
├── lora_single_gpu/
│ ├── pretrain.sh: 基于 LoRA 进行增量预训练
│ ├── sft.sh: 基于 LoRA 进行指令监督微调
│ ├── reward.sh: 基于 LoRA 进行奖励模型训练
│ ├── ppo.sh: 基于 LoRA 进行 PPO 训练
│ ├── dpo.sh: 基于 LoRA 进行 DPO 训练
│ ├── orpo.sh: 基于 LoRA 进行 ORPO 训练
│ ├── sft_mllm.sh: 基于 LoRA 进行多模态指令监督微调
│ ├── prepare.sh: 保存预处理后的数据集
│ └── predict.sh: 基于 LoRA 进行批量预测并计算 BLEU 和 ROUGE 分数
├── qlora_single_gpu/
│ ├── bitsandbytes.sh: 基于 QLoRA 微调 4/8 比特 BNB 模型
│ ├── gptq.sh: 基于 QLoRA 微调 4/8 比特 GPTQ 模型
│ ├── awq.sh: 基于 QLoRA 微调 4 比特 AWQ 模型
│ └── aqlm.sh: 基于 QLoRA 微调 2 比特 AQLM 模型
├── lora_multi_gpu/
│ ├── single_node.sh: 使用 Accelerate 进行单节点 LoRA 训练
│ ├── multi_node.sh: 使用 Accelerate 进行多节点 LoRA 训练
│ └── ds_zero3.sh: 使用 DeepSpeed ZeRO-3 进行 LoRA 训练(拆分权重)
├── full_multi_gpu/
│ ├── single_node.sh: 使用 DeepSpeed 进行单节点全量训练
│ ├── multi_node.sh: 使用 DeepSpeed 进行多节点全量训练
│ └── predict.sh: 基于全量训练进行多卡批量预测并计算 BLEU 和 ROUGE 分数
├── merge_lora/
│ ├── merge.sh: 将 LoRA 权重合并到预训练模型中
│ └── quantize.sh: 使用 AutoGPTQ 量化微调后的模型
├── inference/
│ ├── cli_demo.sh: 启动 LoRA 模型的命令行推理接口
│ ├── api_demo.sh: 启动 LoRA 模型的 OpenAI 风格 API
│ ├── web_demo.sh: 启动 LoRA 模型的浏览器推理接口
│ └── evaluate.sh: 在 MMLU/CMMLU/C-Eval 数据集上评测 LoRA 模型
└── extras/
├── galore/
│ └── sft.sh: 使用 GaLore 训练模型
├── badam/
│ └── sft.sh: 使用 BAdam 训练模型
├── loraplus/
│ └── sft.sh: 使用 LoRA+ 训练模型
├── mod/
│ └── sft.sh: 使用深度混合训练模型
├── llama_pro/
│ ├── expand.sh: 扩展模型中的层
│ └── sft.sh: 训练扩展后的模型
└── fsdp_qlora/
└── sft.sh: 使用 FSDP+QLoRA 微调量化模型
```
LLaMA-Factory/examples/accelerate/fsdp_config.yaml
0 → 100644
View file @
afe180a6
compute_environment
:
LOCAL_MACHINE
debug
:
false
distributed_type
:
FSDP
downcast_bf16
:
'
no'
fsdp_config
:
fsdp_auto_wrap_policy
:
TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch
:
BACKWARD_PRE
fsdp_cpu_ram_efficient_loading
:
true
fsdp_forward_prefetch
:
false
fsdp_offload_params
:
true
fsdp_sharding_strategy
:
FULL_SHARD
fsdp_state_dict_type
:
FULL_STATE_DICT
fsdp_sync_module_states
:
true
fsdp_use_orig_params
:
false
machine_rank
:
0
main_training_function
:
main
mixed_precision
:
fp16
num_machines
:
1
# the number of nodes
num_processes
:
2
# the number of GPUs in all nodes
rdzv_backend
:
static
same_network
:
true
tpu_env
:
[]
tpu_use_cluster
:
false
tpu_use_sudo
:
false
use_cpu
:
false
LLaMA-Factory/examples/accelerate/master_config.yaml
0 → 100644
View file @
afe180a6
compute_environment
:
LOCAL_MACHINE
debug
:
false
distributed_type
:
MULTI_GPU
downcast_bf16
:
'
no'
gpu_ids
:
all
machine_rank
:
0
main_process_ip
:
192.168.0.1
main_process_port
:
29555
main_training_function
:
main
mixed_precision
:
fp16
num_machines
:
2
# the number of nodes
num_processes
:
8
# the number of GPUs in all nodes
rdzv_backend
:
static
same_network
:
true
tpu_env
:
[]
tpu_use_cluster
:
false
tpu_use_sudo
:
false
use_cpu
:
false
LLaMA-Factory/examples/accelerate/single_config.yaml
0 → 100644
View file @
afe180a6
compute_environment
:
LOCAL_MACHINE
debug
:
false
distributed_type
:
MULTI_GPU
downcast_bf16
:
'
no'
gpu_ids
:
all
machine_rank
:
0
main_training_function
:
main
mixed_precision
:
fp16
num_machines
:
1
# the number of nodes
num_processes
:
4
# the number of GPUs in all nodes
rdzv_backend
:
static
same_network
:
true
tpu_env
:
[]
tpu_use_cluster
:
false
tpu_use_sudo
:
false
use_cpu
:
false
LLaMA-Factory/examples/accelerate/slave_config.yaml
0 → 100644
View file @
afe180a6
compute_environment
:
LOCAL_MACHINE
debug
:
false
distributed_type
:
MULTI_GPU
downcast_bf16
:
'
no'
gpu_ids
:
all
machine_rank
:
1
main_process_ip
:
192.168.0.1
main_process_port
:
29555
main_training_function
:
main
mixed_precision
:
fp16
num_machines
:
2
# the number of nodes
num_processes
:
8
# the number of GPUs in all nodes
rdzv_backend
:
static
same_network
:
true
tpu_env
:
[]
tpu_use_cluster
:
false
tpu_use_sudo
:
false
use_cpu
:
false
LLaMA-Factory/examples/deepspeed/ds_z2_config.json
0 → 100644
View file @
afe180a6
{
"train_batch_size"
:
"auto"
,
"train_micro_batch_size_per_gpu"
:
"auto"
,
"gradient_accumulation_steps"
:
"auto"
,
"gradient_clipping"
:
"auto"
,
"zero_allow_untested_optimizer"
:
true
,
"fp16"
:
{
"enabled"
:
"auto"
,
"loss_scale"
:
0
,
"loss_scale_window"
:
1000
,
"initial_scale_power"
:
16
,
"hysteresis"
:
2
,
"min_loss_scale"
:
1
},
"bf16"
:
{
"enabled"
:
"auto"
},
"zero_optimization"
:
{
"stage"
:
2
,
"allgather_partitions"
:
true
,
"allgather_bucket_size"
:
5e8
,
"overlap_comm"
:
true
,
"reduce_scatter"
:
true
,
"reduce_bucket_size"
:
5e8
,
"contiguous_gradients"
:
true
,
"round_robin_gradients"
:
true
}
}
\ No newline at end of file
LLaMA-Factory/examples/deepspeed/ds_z2_offload_config.json
0 → 100644
View file @
afe180a6
{
"train_batch_size"
:
"auto"
,
"train_micro_batch_size_per_gpu"
:
"auto"
,
"gradient_accumulation_steps"
:
"auto"
,
"gradient_clipping"
:
"auto"
,
"zero_allow_untested_optimizer"
:
true
,
"fp16"
:
{
"enabled"
:
"auto"
,
"loss_scale"
:
0
,
"loss_scale_window"
:
1000
,
"initial_scale_power"
:
16
,
"hysteresis"
:
2
,
"min_loss_scale"
:
1
},
"bf16"
:
{
"enabled"
:
"auto"
},
"zero_optimization"
:
{
"stage"
:
2
,
"offload_optimizer"
:
{
"device"
:
"cpu"
,
"pin_memory"
:
true
},
"allgather_partitions"
:
true
,
"allgather_bucket_size"
:
5e8
,
"overlap_comm"
:
true
,
"reduce_scatter"
:
true
,
"reduce_bucket_size"
:
5e8
,
"contiguous_gradients"
:
true
,
"round_robin_gradients"
:
true
}
}
\ No newline at end of file
LLaMA-Factory/examples/deepspeed/ds_z3_config.json
0 → 100644
View file @
afe180a6
{
"train_batch_size"
:
"auto"
,
"train_micro_batch_size_per_gpu"
:
"auto"
,
"gradient_accumulation_steps"
:
"auto"
,
"gradient_clipping"
:
"auto"
,
"zero_allow_untested_optimizer"
:
true
,
"fp16"
:
{
"enabled"
:
"auto"
,
"loss_scale"
:
0
,
"loss_scale_window"
:
1000
,
"initial_scale_power"
:
16
,
"hysteresis"
:
2
,
"min_loss_scale"
:
2
},
"bf16"
:
{
"enabled"
:
"auto"
},
"zero_optimization"
:
{
"stage"
:
3
,
"overlap_comm"
:
true
,
"contiguous_gradients"
:
true
,
"sub_group_size"
:
1e9
,
"reduce_bucket_size"
:
"auto"
,
"stage3_prefetch_bucket_size"
:
"auto"
,
"stage3_param_persistence_threshold"
:
"auto"
,
"stage3_max_live_parameters"
:
1e9
,
"stage3_max_reuse_distance"
:
1e9
,
"stage3_gather_16bit_weights_on_model_save"
:
true
}
}
LLaMA-Factory/examples/deepspeed/ds_z3_offload_config.json
0 → 100644
View file @
afe180a6
{
"train_batch_size"
:
"auto"
,
"train_micro_batch_size_per_gpu"
:
"auto"
,
"gradient_accumulation_steps"
:
"auto"
,
"gradient_clipping"
:
"auto"
,
"zero_allow_untested_optimizer"
:
true
,
"fp16"
:
{
"enabled"
:
"auto"
,
"loss_scale"
:
0
,
"loss_scale_window"
:
1000
,
"initial_scale_power"
:
16
,
"hysteresis"
:
2
,
"min_loss_scale"
:
1
},
"bf16"
:
{
"enabled"
:
"auto"
},
"zero_optimization"
:
{
"stage"
:
3
,
"offload_optimizer"
:
{
"device"
:
"cpu"
,
"pin_memory"
:
true
},
"offload_param"
:
{
"device"
:
"cpu"
,
"pin_memory"
:
true
},
"overlap_comm"
:
true
,
"contiguous_gradients"
:
true
,
"sub_group_size"
:
1e9
,
"reduce_bucket_size"
:
"auto"
,
"stage3_prefetch_bucket_size"
:
"auto"
,
"stage3_param_persistence_threshold"
:
"auto"
,
"stage3_max_live_parameters"
:
1e9
,
"stage3_max_reuse_distance"
:
1e9
,
"stage3_gather_16bit_weights_on_model_save"
:
true
}
}
\ No newline at end of file
LLaMA-Factory/examples/extras/badam/sft.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
CUDA_VISIBLE_DEVICES
=
0 python ../../../src/train_bash.py
\
--stage
sft
\
--do_train
\
--model_name_or_path
meta-llama/Llama-2-7b-hf
\
--dataset
alpaca_gpt4_en,glaive_toolcall
\
--dataset_dir
../../../data
\
--template
default
\
--finetuning_type
full
\
--use_badam
\
--badam_switch_mode
descending
\
--badam_switch_block_every
50
\
--badam_verbose
2
\
--output_dir
../../../saves/LLaMA2-7B/badam/sft
\
--overwrite_cache
\
--overwrite_output_dir
\
--cutoff_len
1024
\
--preprocessing_num_workers
16
\
--per_device_train_batch_size
1
\
--per_device_eval_batch_size
1
\
--gradient_accumulation_steps
8
\
--lr_scheduler_type
cosine
\
--logging_steps
10
\
--warmup_steps
20
\
--save_steps
100
\
--eval_steps
100
\
--evaluation_strategy
steps
\
--load_best_model_at_end
\
--learning_rate
5e-5
\
--num_train_epochs
3.0
\
--max_samples
3000
\
--val_size
0.1
\
--plot_loss
\
--pure_bf16
LLaMA-Factory/examples/extras/fsdp_qlora/sft.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
# DO NOT use GPTQ/AWQ model in FSDP+QLoRA
pip
install
"transformers>=4.39.1"
pip
install
"accelerate>=0.28.0"
pip
install
"bitsandbytes>=0.43.0"
CUDA_VISIBLE_DEVICES
=
0,1 accelerate launch
\
--config_file
../../accelerate/fsdp_config.yaml
\
../../../src/train_bash.py
\
--stage
sft
\
--do_train
\
--model_name_or_path
meta-llama/Llama-2-70b-hf
\
--dataset
alpaca_gpt4_en,glaive_toolcall
\
--dataset_dir
../../../data
\
--template
default
\
--finetuning_type
lora
\
--lora_target
q_proj,v_proj
\
--output_dir
../../../saves/LLaMA2-70B/lora/sft
\
--overwrite_cache
\
--overwrite_output_dir
\
--cutoff_len
1024
\
--preprocessing_num_workers
16
\
--per_device_train_batch_size
1
\
--per_device_eval_batch_size
1
\
--gradient_accumulation_steps
4
\
--lr_scheduler_type
cosine
\
--logging_steps
10
\
--warmup_steps
20
\
--save_steps
100
\
--eval_steps
100
\
--evaluation_strategy
steps
\
--load_best_model_at_end
\
--learning_rate
5e-5
\
--num_train_epochs
3.0
\
--max_samples
3000
\
--val_size
0.1
\
--ddp_timeout
180000000
\
--quantization_bit
4
\
--plot_loss
\
--fp16
LLaMA-Factory/examples/extras/galore/sft.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
CUDA_VISIBLE_DEVICES
=
0 python ../../../src/train_bash.py
\
--stage
sft
\
--do_train
\
--model_name_or_path
meta-llama/Llama-2-7b-hf
\
--dataset
alpaca_gpt4_en,glaive_toolcall
\
--dataset_dir
../../../data
\
--template
default
\
--finetuning_type
full
\
--use_galore
\
--galore_layerwise
\
--galore_target
mlp,self_attn
\
--galore_rank
128
\
--galore_scale
2.0
\
--output_dir
../../../saves/LLaMA2-7B/galore/sft
\
--overwrite_cache
\
--overwrite_output_dir
\
--cutoff_len
1024
\
--preprocessing_num_workers
16
\
--per_device_train_batch_size
1
\
--per_device_eval_batch_size
1
\
--gradient_accumulation_steps
1
\
--lr_scheduler_type
cosine
\
--logging_steps
10
\
--warmup_steps
20
\
--save_steps
100
\
--eval_steps
100
\
--evaluation_strategy
steps
\
--load_best_model_at_end
\
--learning_rate
5e-5
\
--num_train_epochs
3.0
\
--max_samples
3000
\
--val_size
0.1
\
--plot_loss
\
--pure_bf16
LLaMA-Factory/examples/extras/llama_pro/expand.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
python ../../../scripts/llama_pro.py
\
--model_name_or_path
meta-llama/Llama-2-7b-hf
\
--output_dir
../../../models/llama2-7b-pro
\
--num_expand
8
LLaMA-Factory/examples/extras/llama_pro/sft.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
CUDA_VISIBLE_DEVICES
=
0 python ../../../src/train_bash.py
\
--stage
sft
\
--do_train
\
--model_name_or_path
../../../models/llama2-7b-pro
\
--dataset
alpaca_gpt4_en,glaive_toolcall
\
--dataset_dir
../../../data
\
--template
default
\
--finetuning_type
freeze
\
--name_module_trainable
all
\
--num_layer_trainable
8
\
--use_llama_pro
\
--output_dir
../../../saves/LLaMA2-7B-Pro/lora/sft
\
--overwrite_cache
\
--overwrite_output_dir
\
--cutoff_len
1024
\
--preprocessing_num_workers
16
\
--per_device_train_batch_size
1
\
--per_device_eval_batch_size
1
\
--gradient_accumulation_steps
8
\
--lr_scheduler_type
cosine
\
--logging_steps
10
\
--warmup_steps
20
\
--save_steps
100
\
--eval_steps
100
\
--evaluation_strategy
steps
\
--load_best_model_at_end
\
--learning_rate
5e-5
\
--num_train_epochs
3.0
\
--max_samples
3000
\
--val_size
0.1
\
--plot_loss
\
--fp16
LLaMA-Factory/examples/extras/loraplus/sft.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
CUDA_VISIBLE_DEVICES
=
0 python ../../src/train_bash.py
\
--stage
sft
\
--do_train
\
--model_name_or_path
meta-llama/Llama-2-7b-hf
\
--dataset
alpaca_gpt4_en,glaive_toolcall
\
--dataset_dir
../../data
\
--template
default
\
--finetuning_type
lora
\
--lora_target
q_proj,v_proj
\
--loraplus_lr_ratio
16.0
\
--output_dir
../../saves/LLaMA2-7B/loraplus/sft
\
--overwrite_cache
\
--overwrite_output_dir
\
--cutoff_len
1024
\
--preprocessing_num_workers
16
\
--per_device_train_batch_size
1
\
--per_device_eval_batch_size
1
\
--gradient_accumulation_steps
8
\
--lr_scheduler_type
cosine
\
--logging_steps
10
\
--warmup_steps
20
\
--save_steps
100
\
--eval_steps
100
\
--evaluation_strategy
steps
\
--load_best_model_at_end
\
--learning_rate
5e-5
\
--num_train_epochs
3.0
\
--max_samples
3000
\
--val_size
0.1
\
--plot_loss
\
--fp16
LLaMA-Factory/examples/extras/mod/sft.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
CUDA_VISIBLE_DEVICES
=
0 python ../../../src/train_bash.py
\
--stage
sft
\
--do_train
\
--model_name_or_path
meta-llama/Llama-2-7b-hf
\
--dataset
alpaca_gpt4_en,glaive_toolcall
\
--dataset_dir
../../../data
\
--template
default
\
--finetuning_type
full
\
--mixture_of_depths
convert
\
--output_dir
../../../saves/LLaMA2-7B/mod/sft
\
--overwrite_cache
\
--overwrite_output_dir
\
--cutoff_len
1024
\
--preprocessing_num_workers
16
\
--per_device_train_batch_size
1
\
--per_device_eval_batch_size
1
\
--gradient_accumulation_steps
8
\
--optim
paged_adamw_8bit
\
--lr_scheduler_type
cosine
\
--logging_steps
10
\
--warmup_steps
20
\
--save_steps
100
\
--eval_steps
100
\
--evaluation_strategy
steps
\
--load_best_model_at_end
\
--learning_rate
5e-5
\
--num_train_epochs
3.0
\
--max_samples
3000
\
--val_size
0.1
\
--plot_loss
\
--pure_bf16
LLaMA-Factory/examples/full_multi_gpu/multi_node.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
python
-m
torch.distributed.run
\
--nproc_per_node
$NPROC_PER_NODE
\
--nnodes
$NNODES
\
--node_rank
$RANK
\
--master_addr
$MASTER_ADDR
\
--master_port
$MASTER_PORT
\
../../src/train_bash.py
\
--deepspeed
../deepspeed/ds_z3_config.json
\
--stage
sft
\
--do_train
\
--model_name_or_path
meta-llama/Llama-2-7b-hf
\
--dataset
alpaca_gpt4_en,glaive_toolcall
\
--dataset_dir
../../data
\
--template
default
\
--finetuning_type
full
\
--output_dir
../../saves/LLaMA2-7B/full/sft
\
--overwrite_cache
\
--overwrite_output_dir
\
--cutoff_len
1024
\
--preprocessing_num_workers
16
\
--per_device_train_batch_size
1
\
--per_device_eval_batch_size
1
\
--gradient_accumulation_steps
2
\
--lr_scheduler_type
cosine
\
--logging_steps
10
\
--warmup_steps
20
\
--save_steps
100
\
--eval_steps
100
\
--evaluation_strategy
steps
\
--learning_rate
5e-5
\
--num_train_epochs
3.0
\
--max_samples
3000
\
--val_size
0.1
\
--ddp_timeout
180000000
\
--plot_loss
\
--fp16
LLaMA-Factory/examples/full_multi_gpu/predict.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
CUDA_VISIBLE_DEVICES
=
0,1,2,3 accelerate launch
\
--config_file
../accelerate/single_config.yaml
\
../../src/train_bash.py
\
--stage
sft
\
--do_predict
\
--model_name_or_path
../../saves/LLaMA2-7B/full/sft
\
--dataset
alpaca_gpt4_en,glaive_toolcall
\
--dataset_dir
../../data
\
--template
default
\
--finetuning_type
full
\
--output_dir
../../saves/LLaMA2-7B/full/predict
\
--overwrite_cache
\
--overwrite_output_dir
\
--cutoff_len
1024
\
--preprocessing_num_workers
16
\
--per_device_eval_batch_size
1
\
--max_samples
20
\
--predict_with_generate
LLaMA-Factory/examples/full_multi_gpu/single_node.sh
0 → 100644
View file @
afe180a6
#!/bin/bash
deepspeed
--num_gpus
4 ../../src/train_bash.py
\
--deepspeed
../deepspeed/ds_z3_config.json
\
--stage
sft
\
--do_train
\
--model_name_or_path
meta-llama/Llama-2-7b-hf
\
--dataset
alpaca_gpt4_en,glaive_toolcall
\
--dataset_dir
../../data
\
--template
default
\
--finetuning_type
full
\
--output_dir
../../saves/LLaMA2-7B/full/sft
\
--overwrite_cache
\
--overwrite_output_dir
\
--cutoff_len
1024
\
--preprocessing_num_workers
16
\
--per_device_train_batch_size
1
\
--per_device_eval_batch_size
1
\
--gradient_accumulation_steps
2
\
--lr_scheduler_type
cosine
\
--logging_steps
10
\
--warmup_steps
20
\
--save_steps
100
\
--eval_steps
100
\
--evaluation_strategy
steps
\
--learning_rate
5e-5
\
--num_train_epochs
3.0
\
--max_samples
3000
\
--val_size
0.1
\
--ddp_timeout
180000000
\
--plot_loss
\
--fp16
Prev
1
2
3
4
5
6
7
8
…
13
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment