Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
MobileNetV2_pytorch
Commits
63567b0c
Commit
63567b0c
authored
Dec 12, 2024
by
Sugon_ldc
Browse files
add model mobilenetv2
parents
Changes
23
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
84 additions
and
0 deletions
+84
-0
single_dcu_train.sh
single_dcu_train.sh
+2
-0
single_one_driver.slurm
single_one_driver.slurm
+26
-0
single_process.sh
single_process.sh
+56
-0
No files found.
single_dcu_train.sh
0 → 100644
View file @
63567b0c
#!/bin/bash
sbatch run_single_one_driver.sh
single_one_driver.slurm
0 → 100644
View file @
63567b0c
#!/bin/bash
#SBATCH -J test
#SBATCH -p wzhdexclu03
#SBATCH -N 1
##SBATCH -n 32
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --gres=dcu:1
#SBATCH -J single_node_1_dcu_train
#SBATCH -o logs/pt-%j.out
#SBATCH -e logs/pt-%j.err
source
~/miniconda3/etc/profile.d/conda.sh
conda activate torch1.10-dtk22.10-py38
#conda activate base
module purge
module load compiler/devtoolset/7.3.1 mpi/hpcx/gcc-7.3.1 compiler/dtk/23.04
module list
export
HIP_VISIBLE_DEVICES
=
0
python
-u
driver.py
#需要运行的程序
single_process.sh
0 → 100644
View file @
63567b0c
#!/bin/bash
export
HSA_FORCE_FINE_GRAIN_PCIE
=
1
export
MIOPEN_FIND_MODE
=
3
export
MIOPEN_COMPILE_PARALLEL_LEVEL
=
1
export
NCCL_PLUGIN_P2P
=
ucx
export
RCCL_NCHANNELS
=
2
export
NCCL_SOCKET_IFNAME
=
ib0
export
NCCL_P2P_LEVEL
=
5
export
NCCL_IB_HCA
=
mlx5_0
export
NCCL_DEBUG
=
INFO
export
NCCL_NET_GDR_LEVEL
=
SYS
export
NCCL_NET_PLUGIN
=
none
unset
RCCL_NCHANNELS
unset
NCCL_NET_GDR_LEVEL
#export NCCL_IB_DISABLE=1
#export NCCL_P2P_DISABLE=1
lrank
=
$OMPI_COMM_WORLD_LOCAL_RANK
echo
"LRANK===============================
$lrank
"
comm_rank
=
$OMPI_COMM_WORLD_RANK
comm_size
=
$OMPI_COMM_WORLD_SIZE
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
echo
"##################################"
#which hipcc
#pip3 list
APP
=
"python3
`
pwd
`
/driver_mpi.py --dist-url tcp://
${
1
}
:12345 --dist-backend nccl --world-size=
${
comm_size
}
--rank=
${
comm_rank
}
"
case
${
lrank
}
in
[
0]
)
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
export
UCX_NET_DEVICES
=
mlx5_0:1
export
UCX_IB_PCI_BW
=
mlx5_0:50Gbs
numactl
--cpunodebind
=
0
--membind
=
0
${
APP
}
;;
[
1]
)
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
export
UCX_NET_DEVICES
=
mlx5_1:1
export
UCX_IB_PCI_BW
=
mlx5_1:50Gbs
numactl
--cpunodebind
=
1
--membind
=
1
${
APP
}
;;
[
2]
)
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
export
UCX_NET_DEVICES
=
mlx5_2:1
export
UCX_IB_PCI_BW
=
mlx5_2:50Gbs
numactl
--cpunodebind
=
2
--membind
=
2
${
APP
}
;;
[
3]
)
export
HIP_VISIBLE_DEVICES
=
0,1,2,3
export
UCX_NET_DEVICES
=
mlx5_3:1
export
UCX_IB_PCI_BW
=
mlx5_3:50Gbs
numactl
--cpunodebind
=
3
--membind
=
3
${
APP
}
;;
esac
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment