Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
GLM-4V_pytorch
Commits
1bfbcff0
Commit
1bfbcff0
authored
Jun 13, 2024
by
wanglch
Browse files
Initial commit
parents
Pipeline
#1204
canceled with stages
Changes
707
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
480 additions
and
0 deletions
+480
-0
swift-main/examples/pytorch/llm/scripts/codefuse_codellama_34b/lora/sft.sh
...es/pytorch/llm/scripts/codefuse_codellama_34b/lora/sft.sh
+32
-0
swift-main/examples/pytorch/llm/scripts/codegeex2_6b/lora_ddp_ds/infer.sh
...les/pytorch/llm/scripts/codegeex2_6b/lora_ddp_ds/infer.sh
+12
-0
swift-main/examples/pytorch/llm/scripts/codegeex2_6b/lora_ddp_ds/sft.sh
...mples/pytorch/llm/scripts/codegeex2_6b/lora_ddp_ds/sft.sh
+37
-0
swift-main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/infer.sh
...s/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/infer.sh
+11
-0
swift-main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/sft.sh
...les/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/sft.sh
+30
-0
swift-main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat_awq/lora/infer.sh
...torch/llm/scripts/codeqwen1half_7b_chat_awq/lora/infer.sh
+13
-0
swift-main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat_awq/lora/sft.sh
...pytorch/llm/scripts/codeqwen1half_7b_chat_awq/lora/sft.sh
+24
-0
swift-main/examples/pytorch/llm/scripts/cogagent_18b_chat/lora/infer.sh
...mples/pytorch/llm/scripts/cogagent_18b_chat/lora/infer.sh
+12
-0
swift-main/examples/pytorch/llm/scripts/cogagent_18b_chat/lora/sft.sh
...xamples/pytorch/llm/scripts/cogagent_18b_chat/lora/sft.sh
+28
-0
swift-main/examples/pytorch/llm/scripts/custom/tigerbot_13b_chat/qlora_ddp_ds/infer.sh
...lm/scripts/custom/tigerbot_13b_chat/qlora_ddp_ds/infer.sh
+12
-0
swift-main/examples/pytorch/llm/scripts/custom/tigerbot_13b_chat/qlora_ddp_ds/sft.sh
.../llm/scripts/custom/tigerbot_13b_chat/qlora_ddp_ds/sft.sh
+39
-0
swift-main/examples/pytorch/llm/scripts/custom/tigerbot_7b/lora_ddp_ds/infer.sh
...torch/llm/scripts/custom/tigerbot_7b/lora_ddp_ds/infer.sh
+12
-0
swift-main/examples/pytorch/llm/scripts/custom/tigerbot_7b/lora_ddp_ds/sft.sh
...pytorch/llm/scripts/custom/tigerbot_7b/lora_ddp_ds/sft.sh
+37
-0
swift-main/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/infer.sh
...amples/pytorch/llm/scripts/dbrx-instruct/lora_mp/infer.sh
+12
-0
swift-main/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh
...examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh
+33
-0
swift-main/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/deepseek2_device_map.json
...cripts/deepseek-v2-chat/lora_mp/deepseek2_device_map.json
+65
-0
swift-main/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/infer.sh
...les/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/infer.sh
+16
-0
swift-main/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh
...mples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh
+32
-0
swift-main/examples/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/infer.sh
...s/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/infer.sh
+11
-0
swift-main/examples/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/sft.sh
...les/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/sft.sh
+12
-0
No files found.
Too many changes to show.
To preserve performance only
707 of 707+
files are displayed.
Plain diff
Email patch
swift-main/examples/pytorch/llm/scripts/codefuse_codellama_34b/lora/sft.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: V100, A10, 3090
# 18GB GPU memory
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_sft.py
\
--model_type
codefuse-codellama-34b-chat
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
AUTO
\
--dtype
fp16
\
--output_dir
output
\
--dataset
xxx.jsonl
\
--val_dataset
yyy.jsonl
\
--num_train_epochs
1
\
--max_length
4096
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
DEFAULT
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
true
\
swift-main/examples/pytorch/llm/scripts/codegeex2_6b/lora_ddp_ds/infer.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 3090
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_infer.py
\
--ckpt_dir
"output/codegeex2-6b/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
swift-main/examples/pytorch/llm/scripts/codegeex2_6b/lora_ddp_ds/sft.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 2 * 3090
# 2 * 20GB GPU memory
nproc_per_node
=
2
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0,1
\
torchrun
\
--nproc_per_node
=
$nproc_per_node
\
--master_port
29500
\
llm_sft.py
\
--model_type
codegeex2-6b
\
--model_revision
master
\
--sft_type
lora
\
--tuner_backend
peft
\
--dtype
AUTO
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
leetcode-python-en
\
--num_train_epochs
1
\
--max_length
4096
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
DEFAULT
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
$(
expr
16 /
$nproc_per_node
)
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--deepspeed
default-zero2
\
swift-main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/infer.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 3090
CUDA_VISIBLE_DEVICES
=
0
\
swift infer
\
--ckpt_dir
"output/codeqwen1half-7b-chat/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
swift-main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/sft.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 3090,A10,V100...
# 20GB GPU memory
CUDA_VISIBLE_DEVICES
=
0
\
swift sft
\
--model_type
codeqwen1half-7b-chat
\
--model_revision
master
\
--sft_type
lora
\
--tuner_backend
peft
\
--dtype
AUTO
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
leetcode-python-en
\
--num_train_epochs
3
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
DEFAULT
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
swift-main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat_awq/lora/infer.sh
0 → 100644
View file @
1bfbcff0
# Experiment env: A10, RTX3090/4090, A100
CUDA_VISIBLE_DEVICES
=
0
\
swift infer
\
--ckpt_dir
"output/codeqwen1half-7b-chat-awq/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
false
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--stream
false
\
--merge_lora
false
\
swift-main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat_awq/lora/sft.sh
0 → 100644
View file @
1bfbcff0
# Experiment env: A10, RTX3090/4090, A100
CUDA_VISIBLE_DEVICES
=
0
\
swift sft
\
--model_type
codeqwen1half-7b-chat-awq
\
--dataset
leetcode-python-en
\
--batch_size
4
\
--max_length
2048
\
--gradient_accumulation_steps
2
\
--learning_rate
5e-5
\
--use_flash_attn
true
\
--eval_steps
2000
\
--save_steps
2000
\
--num_train_epochs
3
\
--check_dataset_strategy
none
\
--gradient_checkpointing
true
\
--weight_decay
0.1
\
--max_grad_norm
1.0
\
--warmup_ratio
0.03
\
--save_total_limit
2
\
--logging_steps
10
\
--sft_type
lora
\
--lora_target_modules
ALL
\
--lora_rank
8
\
--lora_alpha
32
swift-main/examples/pytorch/llm/scripts/cogagent_18b_chat/lora/infer.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: V100, A10, 3090
CUDA_VISIBLE_DEVICES
=
0
\
swift infer
\
--ckpt_dir
"output/cogagent-18b-chat/vx-xxx/checkpoint-xx"
\
--load_args_from_ckpt_dir
true
\
--eval_human
true
\
--max_new_tokens
2048
\
--temperature
0.3
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
swift-main/examples/pytorch/llm/scripts/cogagent_18b_chat/lora/sft.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 2 * A100
# 2 * 45GB
CUDA_VISIBLE_DEVICES
=
0,1
\
swift sft
\
--model_type
cogagent-18b-chat
\
--sft_type
lora
\
--tuner_backend
peft
\
--dtype
AUTO
\
--output_dir
output
\
--dataset
coco-en-2-mini
\
--num_train_epochs
2
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
DEFAULT
\
--gradient_checkpointing
false
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
swift-main/examples/pytorch/llm/scripts/custom/tigerbot_13b_chat/qlora_ddp_ds/infer.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: A10
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_infer.py
\
--ckpt_dir
"output/tigerbot-13b-chat/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--max_new_tokens
2048
\
--temperature
0.3
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
swift-main/examples/pytorch/llm/scripts/custom/tigerbot_13b_chat/qlora_ddp_ds/sft.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 2 * 3090
# 2 * 12GB GPU memory
nproc_per_node
=
2
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0,1
\
torchrun
\
--nproc_per_node
=
$nproc_per_node
\
--master_port
29500
\
llm_sft.py
\
--model_type
tigerbot-13b-chat
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
AUTO
\
--dtype
AUTO
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
stsb-en
\
--num_train_epochs
1
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--quantization_bit
4
\
--bnb_4bit_comp_dtype
AUTO
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
DEFAULT
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
$(
expr
16 /
$nproc_per_node
)
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--deepspeed
default-zero2
\
swift-main/examples/pytorch/llm/scripts/custom/tigerbot_7b/lora_ddp_ds/infer.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: A10
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0
\
python llm_infer.py
\
--ckpt_dir
"output/tigerbot-13b/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--max_new_tokens
2048
\
--temperature
0.3
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
swift-main/examples/pytorch/llm/scripts/custom/tigerbot_7b/lora_ddp_ds/sft.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 2 * 3090
# 2 * 16GB GPU memory
nproc_per_node
=
2
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0,1
\
torchrun
\
--nproc_per_node
=
$nproc_per_node
\
--master_port
29500
\
llm_sft.py
\
--model_type
tigerbot-7b
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
default-generation
\
--dtype
AUTO
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
stsb-en
\
--num_train_epochs
1
\
--max_length
2048
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
DEFAULT
\
--gradient_checkpointing
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
$(
expr
16 /
$nproc_per_node
)
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--deepspeed
default-zero2
\
swift-main/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/infer.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 4 * A100
# 4 * 65GB GPU memory
CUDA_VISIBLE_DEVICES
=
0,1,2,3
\
swift infer
\
--ckpt_dir
"output/dbrx-instruct/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
true
\
--temperature
0.3
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--merge_lora
false
\
swift-main/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 4 * A100
# 4 * 74GB GPU memory
CUDA_VISIBLE_DEVICES
=
0,1,2,3
\
swift sft
\
--model_type
dbrx-instruct
\
--model_revision
master
\
--sft_type
lora
\
--tuner_backend
peft
\
--template_type
AUTO
\
--dtype
bf16
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
blossom-math-zh
\
--num_train_epochs
1
\
--max_length
1024
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_target_modules
ALL
\
--lora_dtype
AUTO
\
--gradient_checkpointing
false
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
2
\
--logging_steps
10
\
--use_flash_attn
true
swift-main/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/deepseek2_device_map.json
0 → 100644
View file @
1bfbcff0
{
"model.embed_tokens"
:
"cuda:0"
,
"model.layers.0"
:
"cuda:0"
,
"model.layers.1"
:
"cuda:0"
,
"model.layers.2"
:
"cuda:0"
,
"model.layers.3"
:
"cuda:0"
,
"model.layers.4"
:
"cuda:0"
,
"model.layers.5"
:
"cuda:0"
,
"model.layers.6"
:
"cuda:0"
,
"model.layers.7"
:
"cuda:1"
,
"model.layers.8"
:
"cuda:1"
,
"model.layers.9"
:
"cuda:1"
,
"model.layers.10"
:
"cuda:1"
,
"model.layers.11"
:
"cuda:1"
,
"model.layers.12"
:
"cuda:1"
,
"model.layers.13"
:
"cuda:1"
,
"model.layers.14"
:
"cuda:2"
,
"model.layers.15"
:
"cuda:2"
,
"model.layers.16"
:
"cuda:2"
,
"model.layers.17"
:
"cuda:2"
,
"model.layers.18"
:
"cuda:2"
,
"model.layers.19"
:
"cuda:2"
,
"model.layers.20"
:
"cuda:2"
,
"model.layers.21"
:
"cuda:3"
,
"model.layers.22"
:
"cuda:3"
,
"model.layers.23"
:
"cuda:3"
,
"model.layers.24"
:
"cuda:3"
,
"model.layers.25"
:
"cuda:3"
,
"model.layers.26"
:
"cuda:3"
,
"model.layers.27"
:
"cuda:3"
,
"model.layers.28"
:
"cuda:4"
,
"model.layers.29"
:
"cuda:4"
,
"model.layers.30"
:
"cuda:4"
,
"model.layers.31"
:
"cuda:4"
,
"model.layers.32"
:
"cuda:4"
,
"model.layers.33"
:
"cuda:4"
,
"model.layers.34"
:
"cuda:4"
,
"model.layers.35"
:
"cuda:4"
,
"model.layers.36"
:
"cuda:5"
,
"model.layers.37"
:
"cuda:5"
,
"model.layers.38"
:
"cuda:5"
,
"model.layers.39"
:
"cuda:5"
,
"model.layers.40"
:
"cuda:5"
,
"model.layers.41"
:
"cuda:5"
,
"model.layers.42"
:
"cuda:5"
,
"model.layers.43"
:
"cuda:5"
,
"model.layers.44"
:
"cuda:6"
,
"model.layers.45"
:
"cuda:6"
,
"model.layers.46"
:
"cuda:6"
,
"model.layers.47"
:
"cuda:6"
,
"model.layers.48"
:
"cuda:6"
,
"model.layers.49"
:
"cuda:6"
,
"model.layers.50"
:
"cuda:6"
,
"model.layers.51"
:
"cuda:6"
,
"model.layers.52"
:
"cuda:7"
,
"model.layers.53"
:
"cuda:7"
,
"model.layers.54"
:
"cuda:7"
,
"model.layers.55"
:
"cuda:7"
,
"model.layers.56"
:
"cuda:7"
,
"model.layers.57"
:
"cuda:7"
,
"model.layers.58"
:
"cuda:7"
,
"model.layers.59"
:
"cuda:7"
,
"model.norm"
:
"cuda:7"
,
"lm_head"
:
"cuda:7"
}
swift-main/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/infer.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 8*A100
# cd /path/to/swift/example/pytorch/llm
PYTHONPATH
=
../../..
\
CUDA_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
\
python llm_infer.py
\
--ckpt_dir
output/deepseek-v2-chat/vx-xxx/checkpoint-xxx
\
--load_dataset_config
true
\
--use_flash_attn
true
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
--device_map_config_path
scripts/deepseek-v2-chat/lora_ddp_ds3/deepseek2_device_map.json
swift-main/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: 8*A100
# 8*80GB GPU memory
CUDA_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
\
swift sft
\
--model_type
deepseek-v2-chat
\
--sft_type
lora
\
--tuner_backend
peft
\
--dtype
bf16
\
--output_dir
output
\
--ddp_backend
nccl
\
--dataset
alpaca-zh#5000
\
--num_train_epochs
1
\
--max_length
1024
\
--check_dataset_strategy
warning
\
--lora_rank
8
\
--lora_alpha
32
\
--lora_dropout_p
0.05
\
--lora_dtype
AUTO
\
--lora_target_modules
DEFAULT
\
--gradient_checkpointing
false
\
--use_flash_attn
true
\
--batch_size
1
\
--weight_decay
0.1
\
--learning_rate
1e-4
\
--gradient_accumulation_steps
16
\
--max_grad_norm
0.5
\
--warmup_ratio
0.03
\
--eval_steps
100
\
--save_steps
100
\
--save_total_limit
10
\
--logging_steps
10
\
--device_map_config_path
scripts/deepseek-v2-chat/lora_ddp_ds3/deepseek2_device_map.json
swift-main/examples/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/infer.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: A100
CUDA_VISIBLE_DEVICES
=
0
\
swift infer
\
--ckpt_dir
"output/deepseek-moe-16b-chat/vx-xxx/checkpoint-xxx"
\
--load_dataset_config
true
\
--use_flash_attn
true
\
--max_new_tokens
2048
\
--temperature
0.1
\
--top_p
0.7
\
--repetition_penalty
1.
\
--do_sample
true
\
swift-main/examples/pytorch/llm/scripts/deepseek_moe_16b_chat/lora/sft.sh
0 → 100644
View file @
1bfbcff0
# Experimental environment: A100
# 52GB GPU memory
CUDA_VISIBLE_DEVICES
=
0
\
swift sft
\
--model_type
deepseek-moe-16b-chat
\
--dataset
damo-agent-mini-zh
\
--train_dataset_sample
20000
\
--max_length
4096
\
--gradient_checkpointing
true
\
--eval_steps
100
\
--use_flash_attn
true
\
--output_dir
output
\
Prev
1
…
13
14
15
16
17
18
19
20
21
…
36
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment