README.md 1.68 KB
Newer Older
huchen's avatar
huchen committed
1
# 介绍
2
本测试用例用于测试目标检测MaskRCNN模型的性能,测试流程如下
huchen's avatar
huchen committed
3
4
5
6

# 测试流程
## 进入工作目录 
	cd references/detection  
7
8
9
10
## 数据集准备

COCO2017数据集

huchen's avatar
huchen committed
11
## 运行指令
12

huchen's avatar
huchen committed
13
### 单卡  
14
15
	export HIP_VISIBLE_DEVICES=0
	
huchen's avatar
huchen committed
16
	python3 train.py --dataset coco --model maskrcnn_resnet50_fpn --epochs 26 \
17
18
	     --lr-steps 16 22 --aspect-ratio-group-factor 3 \
	     --data-path /path/to/{COCO2017_data_dir}  
huchen's avatar
huchen committed
19
若报错Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to .cache/torch/checkpoints/resnet50-19c8e357.pth失败,则需提前下载resnet50-19c8e357.pth,拷贝至.cache/torch/checkpoints/。  
20
21
22

### 单机多卡

23
**1)pytorch启动方式**
24
25
26
27
28

	export HIP_VISIBLE_DEVICES=0,1,2,3
	export NGPUS=4
	export OMP_NUM_THREADS=1
	python3 -m torch.distributed.launch --nproc_per_node= ${NGPUS} --use_env train.py --dataset coco --model maskrcnn_resnet50_fpn --epochs 26 --lr-steps 16 22 --aspect-ratio-group-factor 3 --lr 0.005 --data-path /path/to/{COCO2017_data_dir} > train_2gpu_lr0.005.log 2>&1 &
huchen's avatar
huchen committed
29
30
注意:多卡运行时,学习率与卡数的对应关系为0.02/8*$NGPU,例如,lr_4gpu=0.01,lr_2gpu=0.005,lr_1gpu=0.0025。  

31
**2)mpi启动**
32
33
34

```
cd references/detection 
35
mpirun -np $np --bind-to none single_process.sh localhost
36
37
38
39
40
41
42
43
44
45
```

### 多机多卡

mpi启动

```
mpirun -np $np --hostfile hostfile --bind-to none single_process.sh $dist_url
```

46
47
48
49
50
51
其中,$dist_url为master_node的ip,hostfile为所使用节点的配置文件,格式示例如下:

```
node1 slots=4  
node2 slots=4
```
52

huchen's avatar
huchen committed
53
# 参考
54

huchen's avatar
huchen committed
55
56
[https://github.com/pytorch/vision/tree/master/references/detection](https://github.com/pytorch/vision/tree/master/references/detection)