README.md

# 简介
* Tensorflow2.7训练Mask R-CNN模型，使用coco2017数据集
# 环境准备  

## 1）安装工具包  
dtk22.04.1环境下安装TensorFlow2.7

```
conda create -n Maskrcnn_tf python='3.7'
conda env list
conda activate Maskrcnn_tf
pip3 install tensorflow-2.7.0_dtk22.04-cp37-cp37m-linux_x86_64.whl
```

- 安装pycocotools 

```
pip3 install pycocotools -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com 
```

- 更新pandas 

```
pip3 install -U pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com  
```

- 安装dllogger 

```
git clone --recursive https://github.com/NVIDIA/dllogger.git  
python3 setup.py install 
```

- 安装horovod

```
wget https://files.pythonhosted.org/packages/a7/8b/fcf373aade011f88dafe48628f7d348471d2891974eb5762ceeff3f13920/horovod-0.25.0.tar.gz

tar -zxvf horovod-0.25.0.tar.gz

cd horovod-0.25.0.tar.gz

python3 setup.py build --force develop
```

安装其他依赖

```
pip3 install opencv-python
pip3 install pyyaml
```


## 2）数据处理（train 和 val）

```  
cd dataset/  
git clone http://github.com/tensorflow/models tf-models  
cd tf-models/research  
wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip protobuf.zip  
unzip protobuf.zip  
./bin/protoc object_detection/protos/.proto --python_out=. 
```
返回dataset目录 
  vim create_coco_tf_record.py 
注释掉310 316行  

```
PYTHONPATH="tf-models:tf-models/research" python3 create_coco_tf_record.py \
  --logtostderr \
  --include_masks \
  --train_image_dir=/path/to/COCO2017/images/train2017 \
  --val_image_dir=/path/to/COCO2017/images/val2017 \
  --train_object_annotations_file=/path/to/COCO2017/annotations/instances_train2017.json \
  --val_object_annotations_file=/path/to/COCO2017/annotations/instances_val2017.json \
  --train_caption_annotations_file=/path/to/COCO2017/annotations/captions_train2017.json \
  --val_caption_annotations_file=/path/to/COCO2017/annotations/captions_val2017.json \
  --output_dir=coco2017_tfrecord  
```
生成coco2017_tfrecord文件夹  
## 3）预训练模型下载  
生成的模型文件结构如下:

``` 
weights/
>mask-rcnn/1555659850/  
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/saved_model.pb 
>>variables/  
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.data-00000-of-00001  
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.index  
>resnet/
>>extracted_from_maskrcnn/
>>resnet-nhwc-2018-02-07/  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/checkpoint 
>>>model.ckpt-112603/  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.data-00000-of-00001  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.index  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.meta  
>>resnet-nhwc-2018-10-14/
```

# 测试  

## 单卡训练  
```
#without_xla
export HIP_VISIBLE_DEVICES=0
python3 scripts/benchmark_training.py --gpus 1 --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights 

#with xla
export HIP_VISIBLE_DEVICES=0
python3 scripts/benchmark_training_xla.py --gpus 1 --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights 
```
## 多卡训练 
``` 
#without xla
export HIP_VISIBLE_DEVICES=0,1
python3 scripts/benchmark_training.py --gpus 2 --batch_size 4 --model_dir save_model_2dcu --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights 

#with xla 
export HIP_VISIBLE_DEVICES=0,1
python3 scripts/benchmark_training_xla.py --gpus 2 --batch_size 4 --model_dir save_model_2dcu --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights 
```

## 推理  
```
#without xla
python3 scripts/benchmark_inference.py --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights

#with xla
python3 scripts/benchmark_inference_xla.py --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights
```

# 参考资料
[https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Segmentation/MaskRCNN](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Segmentation/MaskRCNN)