Commit 2f43cff2 authored by Xianzhi Du's avatar Xianzhi Du Committed by A. Unique TensorFlower
Browse files

Internal change

PiperOrigin-RevId: 394772382
parent bff5aad0
...@@ -52,6 +52,7 @@ In the near future, we will add: ...@@ -52,6 +52,7 @@ In the near future, we will add:
| [Mask R-CNN](vision/beta/MODEL_GARDEN.md) | [Mask R-CNN](https://arxiv.org/abs/1703.06870) | | [Mask R-CNN](vision/beta/MODEL_GARDEN.md) | [Mask R-CNN](https://arxiv.org/abs/1703.06870) |
| [ShapeMask](vision/detection) | [ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors](https://arxiv.org/abs/1904.03239) | | [ShapeMask](vision/detection) | [ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors](https://arxiv.org/abs/1904.03239) |
| [SpineNet](vision/beta/MODEL_GARDEN.md) | [SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization](https://arxiv.org/abs/1912.05027) | | [SpineNet](vision/beta/MODEL_GARDEN.md) | [SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization](https://arxiv.org/abs/1912.05027) |
| [Cascade RCNN-RS and RetinaNet-RS](vision/beta/MODEL_GARDEN.md) | [Simple Training Strategies and Model Scaling for Object Detection](https://arxiv.org/abs/2107.00057)|
### Natural Language Processing ### Natural Language Processing
......
...@@ -54,9 +54,12 @@ depth, label smoothing and dropout. ...@@ -54,9 +54,12 @@ depth, label smoothing and dropout.
### Common Settings and Notes ### Common Settings and Notes
* We provide models based on two detection frameworks, [RetinaNet](https://arxiv.org/abs/1708.02002) * We provide models adopting [ResNet-FPN](https://arxiv.org/abs/1612.03144) and
or [Mask R-CNN](https://arxiv.org/abs/1703.06870), and two backbones, [ResNet-FPN](https://arxiv.org/abs/1612.03144) [SpineNet](https://arxiv.org/abs/1912.05027) backbones based on detection frameworks:
or [SpineNet](https://arxiv.org/abs/1912.05027). * [RetinaNet](https://arxiv.org/abs/1708.02002) and [RetinaNet-RS](https://arxiv.org/abs/2107.00057)
* [Mask R-CNN](https://arxiv.org/abs/1703.06870)
* [Cascade RCNN](https://arxiv.org/abs/1712.00726) and [Cascade RCNN-RS](https://arxiv.org/abs/2107.00057)
* Models are all trained on COCO train2017 and evaluated on COCO val2017. * Models are all trained on COCO train2017 and evaluated on COCO val2017.
* Training details: * Training details:
* Models finetuned from ImageNet pretrained checkpoints adopt the 12 or 36 * Models finetuned from ImageNet pretrained checkpoints adopt the 12 or 36
...@@ -99,13 +102,22 @@ depth, label smoothing and dropout. ...@@ -99,13 +102,22 @@ depth, label smoothing and dropout.
### Instance Segmentation Baselines ### Instance Segmentation Baselines
#### Mask R-CNN (ImageNet pretrained)
#### Mask R-CNN (Trained from scratch) #### Mask R-CNN (Trained from scratch)
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Mask AP | Download | | Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Mask AP | Download |
| ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:| | ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:|
| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | config | ResNet50-FPN | 640x640 | 350 | 227.7 | 46.3 | 42.3 | 37.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml) |
| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml) |
SpineNet-96 | 1024x1024 | 500 | 315.0 | 55.2 | 48.1 | 42.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml) |
SpineNet-143 | 1280x1280 | 500 | 498.8 | 79.2 | 49.3 | 43.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml) |
#### Cascade RCNN-RS (Trained from scratch)
backbone | resolution | epochs | params (M) | box AP | mask AP | download
------------ | :--------: | -----: | ---------: | -----: | ------: | -------:
SpineNet-49 | 640x640 | 500 | 56.4 | 46.4 | 40.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml)|
SpineNet-143 | 1280x1280 | 500 | 94.9 | 51.9 | 45.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml)|
## Semantic Segmentation ## Semantic Segmentation
...@@ -131,7 +143,7 @@ depth, label smoothing and dropout. ...@@ -131,7 +143,7 @@ depth, label smoothing and dropout.
### Common Settings and Notes ### Common Settings and Notes
* We provide models for video classification with two backbones: * We provide models for video classification with two backbones:
[SlowOnly](https://arxiv.org/abs/1812.03982) and 3D-ResNet (R3D) used in [SlowOnly](https://arxiv.org/abs/1812.03982) and 3D-ResNet (R3D) used in
[Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800). [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800).
* Training and evaluation details: * Training and evaluation details:
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
# Lint as: python3 # Lint as: python3
"""Mask R-CNN configuration definition.""" """R-CNN(-RS) configuration definition."""
import dataclasses import dataclasses
import os import os
...@@ -432,7 +432,7 @@ def maskrcnn_spinenet_coco() -> cfg.ExperimentConfig: ...@@ -432,7 +432,7 @@ def maskrcnn_spinenet_coco() -> cfg.ExperimentConfig:
@exp_factory.register_config_factory('cascadercnn_spinenet_coco') @exp_factory.register_config_factory('cascadercnn_spinenet_coco')
def cascadercnn_spinenet_coco() -> cfg.ExperimentConfig: def cascadercnn_spinenet_coco() -> cfg.ExperimentConfig:
"""COCO object detection with Cascade R-CNN with SpineNet backbone.""" """COCO object detection with Cascade RCNN-RS with SpineNet backbone."""
steps_per_epoch = 463 steps_per_epoch = 463
coco_val_samples = 5000 coco_val_samples = 5000
train_batch_size = 256 train_batch_size = 256
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Mask R-CNN model.""" """R-CNN(-RS) models."""
from typing import Any, List, Mapping, Optional, Tuple, Union from typing import Any, List, Mapping, Optional, Tuple, Union
...@@ -24,7 +24,7 @@ from official.vision.beta.ops import box_ops ...@@ -24,7 +24,7 @@ from official.vision.beta.ops import box_ops
@tf.keras.utils.register_keras_serializable(package='Vision') @tf.keras.utils.register_keras_serializable(package='Vision')
class MaskRCNNModel(tf.keras.Model): class MaskRCNNModel(tf.keras.Model):
"""The Mask R-CNN model.""" """The Mask R-CNN(-RS) and Cascade RCNN-RS models."""
def __init__(self, def __init__(self,
backbone: tf.keras.Model, backbone: tf.keras.Model,
...@@ -48,7 +48,7 @@ class MaskRCNNModel(tf.keras.Model): ...@@ -48,7 +48,7 @@ class MaskRCNNModel(tf.keras.Model):
aspect_ratios: Optional[List[float]] = None, aspect_ratios: Optional[List[float]] = None,
anchor_size: Optional[float] = None, anchor_size: Optional[float] = None,
**kwargs): **kwargs):
"""Initializes the Mask R-CNN model. """Initializes the R-CNN(-RS) model.
Args: Args:
backbone: `tf.keras.Model`, the backbone network. backbone: `tf.keras.Model`, the backbone network.
...@@ -65,19 +65,18 @@ class MaskRCNNModel(tf.keras.Model): ...@@ -65,19 +65,18 @@ class MaskRCNNModel(tf.keras.Model):
mask_roi_aligner: the ROI alginer for mask prediction. mask_roi_aligner: the ROI alginer for mask prediction.
class_agnostic_bbox_pred: if True, perform class agnostic bounding box class_agnostic_bbox_pred: if True, perform class agnostic bounding box
prediction. Needs to be `True` for Cascade RCNN models. prediction. Needs to be `True` for Cascade RCNN models.
cascade_class_ensemble: if True, ensemble classification scores over cascade_class_ensemble: if True, ensemble classification scores over all
all detection heads. detection heads.
min_level: Minimum level in output feature maps. min_level: Minimum level in output feature maps.
max_level: Maximum level in output feature maps. max_level: Maximum level in output feature maps.
num_scales: A number representing intermediate scales added num_scales: A number representing intermediate scales added on each level.
on each level. For instances, num_scales=2 adds one additional For instances, num_scales=2 adds one additional intermediate anchor
intermediate anchor scales [2^0, 2^0.5] on each level. scales [2^0, 2^0.5] on each level.
aspect_ratios: A list representing the aspect raito aspect_ratios: A list representing the aspect raito anchors added on each
anchors added on each level. The number indicates the ratio of width to level. The number indicates the ratio of width to height. For instances,
height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each scale level.
on each scale level. anchor_size: A number representing the scale of size of the base anchor to
anchor_size: A number representing the scale of size of the base the feature stride 2^level.
anchor to the feature stride 2^level.
**kwargs: keyword arguments to be passed. **kwargs: keyword arguments to be passed.
""" """
super(MaskRCNNModel, self).__init__(**kwargs) super(MaskRCNNModel, self).__init__(**kwargs)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment