"src/vscode:/vscode.git/clone" did not exist on "d936353da0f6131d0cf4157f02855902d78cb159"
README.md 3.98 KB
Newer Older
zhenyi's avatar
zhenyi committed
1
# 简介
2
* Tensorflow2.7训练Mask R-CNN模型,使用coco2017数据集
zhenyi's avatar
zhenyi committed
3
4
5
# 环境准备  

## 1)安装工具包  
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
dtk22.04.1环境下安装TensorFlow2.7

```
conda create -n Maskrcnn_tf python='3.7'
conda env list
conda activate Maskrcnn_tf
pip3 install tensorflow-2.7.0_dtk22.04-cp37-cp37m-linux_x86_64.whl
```

- 安装pycocotools 

```
pip3 install pycocotools -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com 
```

- 更新pandas 

```
pip3 install -U pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com  
```

- 安装dllogger 

```
git clone --recursive https://github.com/NVIDIA/dllogger.git  
python3 setup.py install 
```

- 安装horovod

```
wget https://files.pythonhosted.org/packages/a7/8b/fcf373aade011f88dafe48628f7d348471d2891974eb5762ceeff3f13920/horovod-0.25.0.tar.gz

tar -zxvf horovod-0.25.0.tar.gz

cd horovod-0.25.0.tar.gz

python3 setup.py build --force develop
```

安装其他依赖

```
pip3 install opencv-python
pip3 install pyyaml
```



zhenyi's avatar
zhenyi committed
55
## 2)数据处理(train 和 val)
56

zhenyi's avatar
zhenyi committed
57
58
59
60
61
62
63
64
```  
cd dataset/  
git clone http://github.com/tensorflow/models tf-models  
cd tf-models/research  
wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip protobuf.zip  
unzip protobuf.zip  
./bin/protoc object_detection/protos/.proto --python_out=. 
```
65
66
67
68
返回dataset目录 
  vim create_coco_tf_record.py 
注释掉310 316行  

zhenyi's avatar
zhenyi committed
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
```
PYTHONPATH="tf-models:tf-models/research" python3 create_coco_tf_record.py \
  --logtostderr \
  --include_masks \
  --train_image_dir=/path/to/COCO2017/images/train2017 \
  --val_image_dir=/path/to/COCO2017/images/val2017 \
  --train_object_annotations_file=/path/to/COCO2017/annotations/instances_train2017.json \
  --val_object_annotations_file=/path/to/COCO2017/annotations/instances_val2017.json \
  --train_caption_annotations_file=/path/to/COCO2017/annotations/captions_train2017.json \
  --val_caption_annotations_file=/path/to/COCO2017/annotations/captions_val2017.json \
  --output_dir=coco2017_tfrecord  
```
生成coco2017_tfrecord文件夹  
## 3)预训练模型下载  
生成的模型文件结构如下:

``` 
weights/
>mask-rcnn/1555659850/  
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/saved_model.pb 
>>variables/  
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.data-00000-of-00001  
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.index  
>resnet/
>>extracted_from_maskrcnn/
>>resnet-nhwc-2018-02-07/  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/checkpoint 
>>>model.ckpt-112603/  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.data-00000-of-00001  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.index  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.meta  
>>resnet-nhwc-2018-10-14/
```

# 测试  

## 单卡训练  
```
107
export HIP_VISIBLE_DEVICES=0
zhenyi's avatar
zhenyi committed
108
109
110
111
python3 scripts/benchmark_training.py --gpus 1 --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights 
```
## 多卡训练 
``` 
112
export HIP_VISIBLE_DEVICES=0,1
zhenyi's avatar
zhenyi committed
113
114
115
116
117
118
119
120
121
122
python3 scripts/benchmark_training.py --gpus 2 --batch_size 4 --model_dir save_model_2dcu --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights 
```

## 推理  
```
python3 scripts/benchmark_inference.py --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights
```

# 参考资料
[https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Segmentation/MaskRCNN](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Segmentation/MaskRCNN)