Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Telechat_pytorch
Commits
f7db21eb
Commit
f7db21eb
authored
Aug 22, 2024
by
lvzhen
Browse files
first
parents
Pipeline
#1580
canceled with stages
Changes
674
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
484 additions
and
0 deletions
+484
-0
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b/lora/infer.sh
...les/pytorch/llm/scripts/qwen1half_moe_a2_7b/lora/infer.sh
+14
-0
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b/lora/sft.sh
...mples/pytorch/llm/scripts/qwen1half_moe_a2_7b/lora/sft.sh
+31
-0
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat/lora/infer.sh
...ytorch/llm/scripts/qwen1half_moe_a2_7b_chat/lora/infer.sh
+14
-0
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat/lora/sft.sh
.../pytorch/llm/scripts/qwen1half_moe_a2_7b_chat/lora/sft.sh
+31
-0
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat_int4/qlora/infer.sh
.../llm/scripts/qwen1half_moe_a2_7b_chat_int4/qlora/infer.sh
+12
-0
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat_int4/qlora/sft.sh
...ch/llm/scripts/qwen1half_moe_a2_7b_chat_int4/qlora/sft.sh
+28
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b/lora_ddp_ds/infer.sh
...xamples/pytorch/llm/scripts/qwen_14b/lora_ddp_ds/infer.sh
+13
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b/lora_ddp_ds/sft.sh
.../examples/pytorch/llm/scripts/qwen_14b/lora_ddp_ds/sft.sh
+40
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b/qlora/infer.sh
...wift/examples/pytorch/llm/scripts/qwen_14b/qlora/infer.sh
+13
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b/qlora/sft.sh
ms-swift/examples/pytorch/llm/scripts/qwen_14b/qlora/sft.sh
+35
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b/qlora_ddp_ds/infer.sh
...amples/pytorch/llm/scripts/qwen_14b/qlora_ddp_ds/infer.sh
+13
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b/qlora_ddp_ds/sft.sh
...examples/pytorch/llm/scripts/qwen_14b/qlora_ddp_ds/sft.sh
+42
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/infer.sh
...pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/infer.sh
+12
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/sft.sh
...s/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/sft.sh
+35
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_ds/infer.sh
...es/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_ds/infer.sh
+13
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_ds/sft.sh
...ples/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_ds/sft.sh
+40
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_zero3/infer.sh
...pytorch/llm/scripts/qwen_14b_chat/lora_ddp_zero3/infer.sh
+13
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_zero3/sft.sh
...s/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_zero3/sft.sh
+38
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/qlora/infer.sh
...examples/pytorch/llm/scripts/qwen_14b_chat/qlora/infer.sh
+12
-0
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/qlora/sft.sh
...t/examples/pytorch/llm/scripts/qwen_14b_chat/qlora/sft.sh
+35
-0
No files found.
Too many changes to show.
To preserve performance only
674 of 674+
files are displayed.
Plain diff
Email patch
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b/lora/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A100
# 36GB GPU memory
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_infer.py
\
--ckpt_dir
"output/qwen1half-moe-a2_7b/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
true
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b/lora/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A100
# 42GB GPU memory
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_sft.py
\
--model_type
qwen1half-moe-a2_7b
\
--sft_type
lora
\
--tuner_backend
peft
\
--dtype
AUTO
\
--output_dir
output
\
--dataset
dureader-robust-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
1
\
--max_length
1024
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
true
\
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat/lora/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A100
# 36GB GPU memory
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_infer.py
\
--ckpt_dir
"output/qwen1half-moe-a2_7b-chat/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
true
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat/lora/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A100
# 42GB GPU memory
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_sft.py
\
--model_type
qwen1half-moe-a2_7b-chat
\
--sft_type
lora
\
--tuner_backend
peft
\
--dtype
AUTO
\
--output_dir
output
\
--dataset
blossom-math-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
1
\
--max_length
1024
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
true
\
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat_int4/qlora/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A100
CUDA_VISIBLE_DEVICES
=
0
\
swift infer
\
--ckpt_dir
"output/qwen1half-moe-a2_7b-chat-int4/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
true
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat_int4/qlora/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A100
# 17GB GPU memory
CUDA_VISIBLE_DEVICES
=
0
\
swift sft
\
--model_type
qwen1half-moe-a2_7b-chat-int4
\
--sft_type
lora
\
--output_dir
output
\
--dataset
blossom-math-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
3
\
--max_length
2048
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
true
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b/lora_ddp_ds/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A100
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_infer.py
\
--ckpt_dir
"output/qwen-14b/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
true
\
--max_new_tokens
2048
\
--temperature
0.7
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b/lora_ddp_ds/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: 2 * A100
# 2 * 32B GPU memory (use flash_attn)
nproc_per_node
=
2
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0,1
\
torchrun
\
--nproc_per_node
=
$nproc_per_node
\
--master_port
29500
\
llm_sft.py
\
--model_id_or_path
qwen/Qwen-14B
\
--model_revision
master
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
default-generation
\
--dtype
AUTO
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
dureader-robust-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
1
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
$(
expr
16 /
$nproc_per_node
)
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
true
\
--deepspeed
default-zero2
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b/qlora/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A10
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_infer.py
\
--ckpt_dir
"output/qwen-14b/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
false
\
--max_new_tokens
2048
\
--temperature
0.7
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b/qlora/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A10
# 17GB GPU memory
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_sft.py
\
--model_id_or_path
qwen/Qwen-14B
\
--model_revision
master
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
default-generation
\
--dtype
AUTO
\
--output_dir
output
\
--dataset
dureader-robust-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
1
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--quantization_bit
4
\
--bnb_4bit_comp_dtype
AUTO
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
false
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b/qlora_ddp_ds/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A10
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_infer.py
\
--ckpt_dir
"output/qwen-14b/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
false
\
--max_new_tokens
2048
\
--temperature
0.7
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b/qlora_ddp_ds/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: 2 * A10
# 2 * 14GB GPU memory
nproc_per_node
=
2
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0,1
\
torchrun
\
--nproc_per_node
=
$nproc_per_node
\
--master_port
29500
\
llm_sft.py
\
--model_id_or_path
qwen/Qwen-14B
\
--model_revision
master
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
default-generation
\
--dtype
AUTO
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
dureader-robust-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
1
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--quantization_bit
4
\
--bnb_4bit_comp_dtype
AUTO
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
$(
expr
16 /
$nproc_per_node
)
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
false
\
--deepspeed
default-zero2
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A100
CUDA_VISIBLE_DEVICES
=
0
\
swift infer
\
--ckpt_dir
"output/qwen-14b-chat/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
true
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: 4 * A100
# 4 * 78GB GPU memory
nproc_per_node
=
4
NPROC_PER_NODE
=
$nproc_per_node
\
MASTER_PORT
=
29500
\
CUDA_VISIBLE_DEVICES
=
0,1,2,3
\
swift sft
\
--model_id_or_path
qwen/Qwen-14B-Chat
\
--model_revision
master
\
--sft_type
full
\
--tuner_backend
peft
\
--template_type
AUTO
\
--dtype
AUTO
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
blossom-math-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
5
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
$(
expr
64 /
$nproc_per_node
)
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
true
\
--deepspeed
'default-zero3'
\
--save_only_model
true
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_ds/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A100
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_infer.py
\
--ckpt_dir
"output/qwen-14b-chat/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
true
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_ds/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: 2 * A100
# 2 * 30GB GPU memory (use flash_attn)
nproc_per_node
=
2
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0,1
\
torchrun
\
--nproc_per_node
=
$nproc_per_node
\
--master_port
29500
\
llm_sft.py
\
--model_id_or_path
qwen/Qwen-14B-Chat
\
--model_revision
master
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
AUTO
\
--dtype
AUTO
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
blossom-math-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
1
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
$(
expr
16 /
$nproc_per_node
)
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
true
\
--deepspeed
default-zero2
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_zero3/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: 2 * 3090
CUDA_VISIBLE_DEVICES
=
0,1
\
swift infer
\
--ckpt_dir
"output/qwen-14b-chat/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
false
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/lora_ddp_zero3/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: 4 * 3090
# 4 * 24GB GPU memory
nproc_per_node
=
4
CUDA_VISIBLE_DEVICES
=
0,1,2,3
\
NPROC_PER_NODE
=
$nproc_per_node
\
MASTER_PORT
=
29500
\
swift sft
\
--model_id_or_path
qwen/Qwen-14B-Chat
\
--model_revision
master
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
AUTO
\
--dtype
AUTO
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
blossom-math-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
5
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
$(
expr
16 /
$nproc_per_node
)
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
false
\
--deepspeed
default-zero3
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/qlora/infer.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A10, 3090
CUDA_VISIBLE_DEVICES
=
0
\
swift infer
\
--ckpt_dir
"output/qwen-14b-chat/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
false
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
ms-swift/examples/pytorch/llm/scripts/qwen_14b_chat/qlora/sft.sh
0 → 100644
View file @
f7db21eb
# Experimental environment: A10, 3090
# 16GB GPU memory
# Recommended to use `qwen_14b_chat_int4`
CUDA_VISIBLE_DEVICES
=
0
\
swift sft
\
--model_id_or_path
qwen/Qwen-14B-Chat
\
--model_revision
master
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
AUTO
\
--dtype
AUTO
\
--output_dir
output
\
--dataset
blossom-math-zh
\
--train_dataset_sample
-1
\
--num_train_epochs
1
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--quantization_bit
4
\
--bnb_4bit_comp_dtype
AUTO
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
false
\
Prev
1
…
28
29
30
31
32
33
34
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment