# MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context [![Paper](http://img.shields.io/badge/Paper-arXiv.2112.11623-B3181B?logo=arXiv)](https://arxiv.org/abs/2112.11623) This repository is the official implementation of the following paper. * [MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context](https://arxiv.org/abs/2112.11623) ## Description MOSAIC is a neural network architecture for efficient and accurate semantic image segmentation on mobile devices. MOSAIC is designed using commonly supported neural operations by diverse mobile hardware platforms for flexible deployment across various mobile platforms. With a simple asymmetric encoder-decoder structure which consists of an efficient multi-scale context encoder and a light-weight hybrid decoder to recover spatial details from aggregated information, MOSAIC achieves better balanced performance while considering accuracy and computational cost. Deployed on top of a tailored feature extraction backbone based on a searched classification network, MOSAIC achieves a 5% absolute accuracy gain on ADE20K with similar or lower latency compared to the current industry standard MLPerf mobile v1.0 models and state-of-the-art architectures. [MLPerf Mobile v2.0]((https://mlcommons.org/en/inference-mobile-20/)) included MOSAIC as a new industry standard benchmark model for image segmentation. Please see details [here](https://mlcommons.org/en/news/mlperf-inference-1q2022/). You can also refer to the [MLCommons GitHub repository](https://github.com/mlcommons/mobile_open/tree/main/vision/mosaic). ## History ### Oct 13, 2022 * First release of MOSAIC in TensorFlow 2 including checkpoints that have been pretrained on Cityscapes. ## Maintainers * Weijun Wang ([weijunw-g](https://github.com/weijunw-g)) * Fang Yang ([fyangf](https://github.com/fyangf)) * Shixin Luo ([luotigerlsx](https://github.com/luotigerlsx)) ## Requirements [![Python](https://img.shields.io/pypi/pyversions/tensorflow.svg?style=plastic)](https://badge.fury.io/py/tensorflow) [![tf-models-official PyPI](https://badge.fury.io/py/tf-models-official.svg)](https://badge.fury.io/py/tf-models-official) ## Results The following table shows the mIoU measured on the `cityscapes` dataset. | Config | Backbone | Resolution | branch_filter_depths | pyramid_pool_bin_nums | mIoU | Download | |-------------------------|:--------------------:|:----------:|:--------------------:|:---------------------:|:-----:|:--------:| | Paper reference config | MobileNetMultiAVGSeg | 1024x2048 | [32, 32] | [4, 8, 16] | 75.98 | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/mosaic/MobileNetMultiAVGSeg-r1024-ebf32-nogp.tar.gz)
[tensorboard](https://tensorboard.dev/experiment/okEog90bSwupajFgJwGEIw/#scalars) | | Current best config | MobileNetMultiAVGSeg | 1024x2048 | [64, 64] | [1, 4, 8, 16] | 77.24 | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/mosaic/MobileNetMultiAVGSeg-r1024-ebf64-gp.tar.gz)
[tensorboard](https://tensorboard.dev/experiment/l5hkV7JaQM23EXeOBT6oJg/#scalars) | * `branch_filter_depths`: the number of convolution channels in each branch at a pyramid level after `Spatial Pyramid Pooling` * `pyramid_pool_bin_nums`: the number of bins at each level of the `Spatial Pyramid Pooling` ## Training It can run on Google Cloud Platform using Cloud TPU. [Here](https://cloud.google.com/tpu/docs/how-to) is the instruction of using Cloud TPU. Following the instructions to set up Cloud TPU and launch training by: ```shell EXP_TYPE=mosaic_mnv35_cityscapes EXP_NAME="" # You can give any name to the experiment. TPU_NAME="" # The name assigned while creating a Cloud TPU MODEL_DIR="gs://" # Now launch the experiment. python3 -m official.projects.mosaic.train \ --experiment=$EXP_TYPE \ --mode=train \ --tpu=$TPU_NAME \ --model_dir=$MODEL_DIR \ --config_file=official/projects/mosaic/configs/experiments/mosaic_mnv35_cityscapes_tdfs_tpu.yaml ``` ## Evaluation Please run this command line for evaluation. ```shell EXP_TYPE=mosaic_mnv35_cityscapes EXP_NAME="" # You can give any name to the experiment. TPU_NAME="" # The name assigned while creating a Cloud TPU MODEL_DIR="gs://" # Now launch the experiment. python3 -m official.projects.mosaic.train \ --experiment=$EXP_TYPE \ --mode=eval \ --tpu=$TPU_NAME \ --model_dir=$MODEL_DIR \ --config_file=official/projects/mosaic/configs/experiments/mosaic_mnv35_cityscapes_tdfs_tpu.yaml ``` ## Quantization Aware Training (QAT) We support quantization aware training (QAT) and convert trained model to a TFLite model for on-device inference. ### QAT Training ```shell EXP_TYPE=mosaic_mnv35_cityscapes_qat EXP_NAME="" # You can give any name to the experiment. TPU_NAME="" # The name assigned while creating a Cloud TPU MODEL_DIR="gs://" NON_QAT_CHECKPOINT="gs://" # The checkpoint of non-qat training python3 -m official.projects.mosaic.train \ --experiment=$EXP_TYPE \ --mode=eval \ --tpu=$TPU_NAME \ --model_dir=$MODEL_DIR \ --config_file=official/projects/mosaic/qat/configs/experiments/semantic_segmentation/mosaic_mnv35_cityscapes_tfds_qat_tpu.yaml \ --params_override="task.quantization.pretrained_original_checkpoint=${NON_QAT_CHECKPOINT}" ``` ### Export TFLite ```shell EXP_TYPE=mosaic_mnv35_cityscapes_qat QAT_CKPT_PATH="gs://" # The checkpoint of qat training EXPORT_PATH="" # The path of SavedModel to be exported INPUT_SIZE="" # The image size that the model is trained on e.g. 1024,2048 for Cityscapes python3 -m official.projects.mosaic.qat.serving.export_saved_model \ --checkpoint_path=${QAT_CKPT_PATH} \ --config_file=${QAT_CKPT_PATH}/params.yaml \ --export_dir=${EXPORT_PATH} \ --experiment=${EXP_TYPE} \ --input_type=tflite \ --input_image_size=${INPUT_SIZE} \ --alsologtostderr ``` ```shell EXP_TYPE=mosaic_mnv35_cityscapes_qat SAVED_MOODEL_DIR="" # The path to the SavedModel exported in the previous step TFLITE_PATH="" # The path of TFLite file to be exported python3 -m official.projects.mosaic.qat.serving.export_tflite \ --experiment=${EXP_TYPE} \ --saved_model_dir=${SAVED_MOODEL_DIR} \ --tflite_path=${TFLITE_PATH} \ --quant_type=qat \ --alsologtostderr ``` ### Results The benchmark results on Cityscapes are reported below: model | resolution | mIoU | mIoU (QAT INT8) | Latency (QAT INT8, ms per img on Pixel6) | download (ckpt) | download (tflite) :------------------------------ | :--------: | ----------: | --------------: | ----------------------------------------------------------------------: | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | ----------------: MobileNet Multi-HW AVG + MOSAIC | 1024x2048 | 77.24 | 77.13 | 524ms, 327ms (w/o, w/ XNNPACK) | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/mosaic/MobileNetMultiAVGSeg-r1024-ebf64-gp-qat.tar.gz) \| [tensorboard](https://tensorboard.dev/experiment/g0ZzmRDdRdGn5Xn07xXvwg/#scalars) | [QAT INT8](https://storage.googleapis.com/tf_model_garden/vision/mosaic/mobilenet_multiavgseg_r1024_ebf64_gp_qat/tflite_model_depthwise_qat_int8.tflite) ## License [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) This project is licensed under the terms of the **Apache License 2.0**. ## Citation If you want to cite this repository in your work, please consider citing the paper. ``` @inproceedings{weijun2021mosaic, title={MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context}, author={Weijun Wang, Andrew Howard}, journal={arXiv preprint arXiv:2112.11623}, year={2021}, } ```