# 简介 * Tensorflow2.7训练Mask R-CNN模型,使用coco2017数据集 # 环境准备 ## 1)安装工具包 dtk22.04.1环境下安装TensorFlow2.7 ``` conda create -n Maskrcnn_tf python='3.7' conda env list conda activate Maskrcnn_tf pip3 install tensorflow-2.7.0_dtk22.04-cp37-cp37m-linux_x86_64.whl ``` - 安装pycocotools ``` pip3 install pycocotools -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com ``` - 更新pandas ``` pip3 install -U pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com ``` - 安装dllogger ``` git clone --recursive https://github.com/NVIDIA/dllogger.git python3 setup.py install ``` - 安装horovod ``` wget https://files.pythonhosted.org/packages/a7/8b/fcf373aade011f88dafe48628f7d348471d2891974eb5762ceeff3f13920/horovod-0.25.0.tar.gz tar -zxvf horovod-0.25.0.tar.gz cd horovod-0.25.0.tar.gz python3 setup.py build --force develop ``` 安装其他依赖 ``` pip3 install opencv-python pip3 install pyyaml ``` ## 2)数据处理(train 和 val) ``` cd dataset/ git clone http://github.com/tensorflow/models tf-models cd tf-models/research wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip protobuf.zip unzip protobuf.zip ./bin/protoc object_detection/protos/.proto --python_out=. ``` 返回dataset目录 vim create_coco_tf_record.py 注释掉310 316行 ``` PYTHONPATH="tf-models:tf-models/research" python3 create_coco_tf_record.py \ --logtostderr \ --include_masks \ --train_image_dir=/path/to/COCO2017/images/train2017 \ --val_image_dir=/path/to/COCO2017/images/val2017 \ --train_object_annotations_file=/path/to/COCO2017/annotations/instances_train2017.json \ --val_object_annotations_file=/path/to/COCO2017/annotations/instances_val2017.json \ --train_caption_annotations_file=/path/to/COCO2017/annotations/captions_train2017.json \ --val_caption_annotations_file=/path/to/COCO2017/annotations/captions_val2017.json \ --output_dir=coco2017_tfrecord ``` 生成coco2017_tfrecord文件夹 ## 3)预训练模型下载 生成的模型文件结构如下: ``` weights/ >mask-rcnn/1555659850/ https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/saved_model.pb >>variables/ https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.data-00000-of-00001 https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.index >resnet/ >>extracted_from_maskrcnn/ >>resnet-nhwc-2018-02-07/ https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/checkpoint >>>model.ckpt-112603/ https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.data-00000-of-00001 https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.index https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.meta >>resnet-nhwc-2018-10-14/ ``` # 测试 ## 单卡训练 ``` #without_xla export HIP_VISIBLE_DEVICES=0 python3 scripts/benchmark_training.py --gpus 1 --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights #with xla export HIP_VISIBLE_DEVICES=0 python3 scripts/benchmark_training_xla.py --gpus 1 --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights ``` ## 多卡训练 ``` #without xla export HIP_VISIBLE_DEVICES=0,1 python3 scripts/benchmark_training.py --gpus 2 --batch_size 4 --model_dir save_model_2dcu --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights #with xla export HIP_VISIBLE_DEVICES=0,1 python3 scripts/benchmark_training_xla.py --gpus 2 --batch_size 4 --model_dir save_model_2dcu --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights ``` ## 推理 ``` #without xla python3 scripts/benchmark_inference.py --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights #with xla python3 scripts/benchmark_inference_xla.py --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights ``` # 参考资料 [https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Segmentation/MaskRCNN](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Segmentation/MaskRCNN)