Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50v1.5_pytorch
Commits
e9933264
Commit
e9933264
authored
Dec 19, 2023
by
Sugon_ldc
Browse files
modify some script
parent
e059986a
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
16 additions
and
7 deletions
+16
-7
README.md
README.md
+12
-3
train_multi_fp16.sh
train_multi_fp16.sh
+1
-1
train_multi_fp32.sh
train_multi_fp32.sh
+1
-1
train_single_fp16.sh
train_single_fp16.sh
+1
-1
train_single_fp32.sh
train_single_fp32.sh
+1
-1
No files found.
README.md
View file @
e9933264
...
@@ -34,6 +34,8 @@ ResNet50v1.5的算法原理是利用残差连接和深层卷积层来构建更
...
@@ -34,6 +34,8 @@ ResNet50v1.5的算法原理是利用残差连接和深层卷积层来构建更
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py38-latest
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py38-latest
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
#进入容器后
pip install pynvml
```
```
### Dockerfile(方法二)
### Dockerfile(方法二)
此处提供dockerfile的使用方法
此处提供dockerfile的使用方法
...
@@ -41,6 +43,8 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --p
...
@@ -41,6 +43,8 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --p
cd ./docker
cd ./docker
docker build --no-cache -t resnet:v1.5 .
docker build --no-cache -t resnet:v1.5 .
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
#进入容器后
pip install pynvml
```
```
### Anaconda(方法三)
### Anaconda(方法三)
此处提供本地配置、编译的详细步骤,例如:
此处提供本地配置、编译的详细步骤,例如:
...
@@ -49,11 +53,15 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --p
...
@@ -49,11 +53,15 @@ docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --p
```
```
DTK驱动:dtk23.04.1
DTK驱动:dtk23.04.1
python:python3.8
python:python3.8
torch:1.1
0
torch:1.1
3
torchvision:0.1
0
torchvision:0.1
4.1
```
```
`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
另外需要安装如下三方库
```
pip install pyyaml
pip install pynvml
```
## 数据集
## 数据集
Imagenet
Imagenet
...
@@ -77,6 +85,7 @@ data
...
@@ -77,6 +85,7 @@ data
```
```
## 训练
## 训练
在运行脚本过程中根据实际情况修改相关脚本的数据集路径及log文件命名
### 单机单卡(fp16)
### 单机单卡(fp16)
```
```
...
...
train_multi_fp16.sh
View file @
e9933264
...
@@ -2,4 +2,4 @@ export HSA_FORCE_FINE_GRAIN_PCIE=1
...
@@ -2,4 +2,4 @@ export HSA_FORCE_FINE_GRAIN_PCIE=1
export
USE_MIOPEN_BATCHNORM
=
1
export
USE_MIOPEN_BATCHNORM
=
1
export
HIP_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
HIP_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
DATA_DIR
=
/data/imagenet2012
#数据集路径
export
DATA_DIR
=
/data/imagenet2012
#数据集路径
python ./multiproc.py
--nproc_per_node
8 ./launch.py
--model
resnet50
--precision
AMP
--mode
convergence
--platform
Z100L
${
DATA_DIR
}
--data-backend
pytorch
--epochs
100
--batch-size
128
--workspace
${
1
:-
./run
}
--raport-file
raport.json 2>&1 |
tee
resnet50_multi_
`
date
+%Y%m%d%H%M%S
`
.log
python ./multiproc.py
--nproc_per_node
8 ./launch.py
--model
resnet50
--precision
AMP
--mode
convergence
--platform
Z100L
${
DATA_DIR
}
--data-backend
pytorch
--epochs
100
--batch-size
128
--workspace
${
1
:-
./run
}
--raport-file
raport.json 2>&1 |
tee
resnet50_multi_
fp16_
`
date
+%Y%m%d%H%M%S
`
.log
train_multi_fp32.sh
View file @
e9933264
...
@@ -2,4 +2,4 @@ export HSA_FORCE_FINE_GRAIN_PCIE=1
...
@@ -2,4 +2,4 @@ export HSA_FORCE_FINE_GRAIN_PCIE=1
export
USE_MIOPEN_BATCHNORM
=
1
export
USE_MIOPEN_BATCHNORM
=
1
export
HIP_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
HIP_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
DATA_DIR
=
/data/imagenet2012
#数据集路径
export
DATA_DIR
=
/data/imagenet2012
#数据集路径
python ./multiproc.py
--nproc_per_node
8 ./launch.py
--model
resnet50
--precision
FP32
--mode
convergence
--platform
Z100L
${
DATA_DIR
}
--data-backend
pytorch
--epochs
100
--batch-size
128
--workspace
${
1
:-
./run
}
--raport-file
raport.json 2>&1 |
tee
resnet50_multi_
`
date
+%Y%m%d%H%M%S
`
.log
python ./multiproc.py
--nproc_per_node
8 ./launch.py
--model
resnet50
--precision
FP32
--mode
convergence
--platform
Z100L
${
DATA_DIR
}
--data-backend
pytorch
--epochs
100
--batch-size
128
--workspace
${
1
:-
./run
}
--raport-file
raport.json 2>&1 |
tee
resnet50_multi_
fp32_
`
date
+%Y%m%d%H%M%S
`
.log
train_single_fp16.sh
View file @
e9933264
export
USE_MIOPEN_BATCHNORM
=
1
export
USE_MIOPEN_BATCHNORM
=
1
export
HIP_VISIBLE_DEVICES
=
0
export
HIP_VISIBLE_DEVICES
=
0
export
DATA_DIR
=
/data/imagenet2012
#数据集路径
export
DATA_DIR
=
/data/imagenet2012
#数据集路径
python ./multiproc.py
--nproc_per_node
1 ./launch.py
--model
resnet50
--precision
AMP
--mode
convergence
--platform
Z100L
${
DATA_DIR
}
--data-backend
pytorch
--epochs
100
--batch-size
128
--workspace
${
1
:-
./run
}
--raport-file
raport.json 2>&1 |
tee
resnet50_
multi
_
`
date
+%Y%m%d%H%M%S
`
.log
python ./multiproc.py
--nproc_per_node
1 ./launch.py
--model
resnet50
--precision
AMP
--mode
convergence
--platform
Z100L
${
DATA_DIR
}
--data-backend
pytorch
--epochs
100
--batch-size
128
--workspace
${
1
:-
./run
}
--raport-file
raport.json 2>&1 |
tee
resnet50_
single_fp16
_
`
date
+%Y%m%d%H%M%S
`
.log
train_single_fp32.sh
View file @
e9933264
export
USE_MIOPEN_BATCHNORM
=
1
export
USE_MIOPEN_BATCHNORM
=
1
export
HIP_VISIBLE_DEVICES
=
0
export
HIP_VISIBLE_DEVICES
=
0
export
DATA_DIR
=
/data/imagenet2012
#数据集路径
export
DATA_DIR
=
/data/imagenet2012
#数据集路径
python ./multiproc.py
--nproc_per_node
1 ./launch.py
--model
resnet50
--precision
FP32
--mode
convergence
--platform
Z100L
${
DATA_DIR
}
--data-backend
pytorch
--epochs
100
--batch-size
128
--workspace
${
1
:-
./run
}
--raport-file
raport.json 2>&1 |
tee
resnet50_
multi
_
`
date
+%Y%m%d%H%M%S
`
.log
python ./multiproc.py
--nproc_per_node
1 ./launch.py
--model
resnet50
--precision
FP32
--mode
convergence
--platform
Z100L
${
DATA_DIR
}
--data-backend
pytorch
--epochs
100
--batch-size
128
--workspace
${
1
:-
./run
}
--raport-file
raport.json 2>&1 |
tee
resnet50_
single_fp32
_
`
date
+%Y%m%d%H%M%S
`
.log
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment