Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ChatGLM-6B_pytorch
Commits
854699f0
"docs/_tutorials/one-cycle.md" did not exist on "b84a1fa410934353db1a68055f224be353c99989"
Commit
854699f0
authored
Jul 20, 2023
by
zhaoying1
Browse files
Add new file
parent
c4c5ae73
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
75 additions
and
0 deletions
+75
-0
ptuning/mpirun_slurm/run_train_single.sh
ptuning/mpirun_slurm/run_train_single.sh
+75
-0
No files found.
ptuning/mpirun_slurm/run_train_single.sh
0 → 100644
View file @
854699f0
#!/bin/bash
export
HSA_FORCE_FINE_GRAIN_PCIE
=
1
export
MIOPEN_FIND_MODE
=
3
export
MIOPEN_COMPILE_PARALLEL_LEVEL
=
1
export
NCCL_PLUGIN_P2P
=
ucx
export
RCCL_NCHANNELS
=
2
export
NCCL_SOCKET_IFNAME
=
ib0
export
NCCL_P2P_LEVEL
=
5
export
NCCL_IB_HCA
=
mlx5_0
export
NCCL_DEBUG
=
INFO
export
NCCL_NET_GDR_LEVEL
=
SYS
export
NCCL_NET_PLUGIN
=
none
unset
RCCL_NCHANNELS
unset
NCCL_NET_GDR_LEVEL
lrank
=
$OMPI_COMM_WORLD_LOCAL_RANK
echo
"LRANK===============================
$lrank
"
RANK
=
$OMPI_COMM_WORLD_RANK
WORLD_SIZE
=
$OMPI_COMM_WORLD_SIZE
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
LR
=
1e-5
APP
=
"python3 /public/home/zhaoying1/work/chatglm-main/ptuning/main-v1.py
\
--deepspeed /public/home/zhaoying1/work/chatglm-main/ptuning/deepspeed.json
\
--do_train
\
--train_file /public/home/zhaoying1/work/chatglm-main/ptuning/sugon_md_word_faq.json
\
--prompt_column prompt
\
--response_column response
\
--model_name_or_path /public/home/zhaoying1/work/model_scope/chatglm-6b
\
--output_dir ./pt_output/pretrain
\
--overwrite_output_dir
\
--max_source_length 3
\
--max_target_length 1024
\
--per_device_train_batch_size 1
\
--per_device_eval_batch_size 1
\
--gradient_accumulation_steps 1
\
--predict_with_generate
\
--max_steps 2000
\
--logging_steps 5
\
--save_steps 1000
\
--learning_rate
$LR
\
--fp16
\
--local_rank
$lrank
"
case
${
lrank
}
in
[
0]
)
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
export
UCX_NET_DEVICES
=
mlx5_0:1
export
UCX_IB_PCI_BW
=
mlx5_0:50Gbs
numactl
--cpunodebind
=
0
--membind
=
0
${
APP
}
;;
[
1]
)
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
export
UCX_NET_DEVICES
=
mlx5_1:1
export
UCX_IB_PCI_BW
=
mlx5_1:50Gbs
numactl
--cpunodebind
=
1
--membind
=
1
${
APP
}
;;
[
2]
)
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
export
UCX_NET_DEVICES
=
mlx5_2:1
export
UCX_IB_PCI_BW
=
mlx5_2:50Gbs
numactl
--cpunodebind
=
2
--membind
=
2
${
APP
}
;;
[
3]
)
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
export
UCX_NET_DEVICES
=
mlx5_3:1
export
UCX_IB_PCI_BW
=
mlx5_3:50Gbs
numactl
--cpunodebind
=
3
--membind
=
3
${
APP
}
;;
esac
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment