Internal change

PiperOrigin-RevId: 373155894

Internal change
PiperOrigin-RevId: 373155894
7adc6ec1 · Dan Kondratyuk · A. Unique TensorFlower · 6d6cd4ac · 7adc6ec1 · 7adc6ec1
Commit 7adc6ec1 authored May 11, 2021 by Dan Kondratyuk Committed by A. Unique TensorFlower May 11, 2021
20 changed files
--- a/official/vision/beta/projects/movinet/README.google.md
+++ b/official/vision/beta/projects/movinet/README.google.md
+# Mobile Video Networks (MoViNets)
+
+Design doc: go/movinet
+
+## Getting Started
+
+```shell
+bash third_party/tensorflow_models/official/vision/beta/projects/movinet/google/run_train.sh
+```
+
+## Results
+
+Results are tracked at go/movinet-experiments.
--- a/official/vision/beta/projects/movinet/README.md
+++ b/official/vision/beta/projects/movinet/README.md
+# Mobile Video Networks (MoViNets)
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tensorflow/models/tree/master/official/vision/beta/projects/movinet/movinet_tutorial.ipynb)
+[![TensorFlow Hub](https://img.shields.io/badge/TF%20Hub-Models-FF6F00?logo=tensorflow)](https://tfhub.dev/google/collections/movinet)
+[![Paper](http://img.shields.io/badge/Paper-arXiv.2103.11511-B3181B?logo=arXiv)](https://arxiv.org/abs/2103.11511)
+
+This repository is the official implementation of
+[MoViNets: Mobile Video Networks for Efficient Video
+Recognition](https://arxiv.org/abs/2103.11511).
+
+## Description
+
+Mobile Video Networks (MoViNets) are efficient video classification models
+runnable on mobile devices. MoViNets demonstrate state-of-the-art accuracy and
+efficiency on several large-scale video action recognition datasets.
+
+There is a large gap between video model performance of accurate models and
+efficient models for video action recognition. On the one hand, 2D MobileNet
+CNNs are fast and can operate on streaming video in real time, but are prone to
+be noisy and are inaccurate. On the other hand, 3D CNNs are accurate, but are
+memory and computation intensive and cannot operate on streaming video.
+
+MoViNets bridge this gap, producing:
+
+- State-of-the art efficiency and accuracy across the model family (MoViNet-A0
+to A6).
+- Streaming models with 3D causal convolutions substantially reducing memory
+usage.
+- Temporal ensembles of models to boost efficiency even higher.
+
+Small MoViNets demonstrate higher efficiency and accuracy than MobileNetV3 for
+video action recognition (Kinetics 600).
+
+MoViNets also improve efficiency by outputting high-quality predictions with a
+single frame, as opposed to the traditional multi-clip evaluation approach.
+
+[![Multi-Clip Eval](https://storage.googleapis.com/tf_model_garden/vision/movinet/artifacts/movinet_multi_clip_eval.png)](https://arxiv.org/pdf/2103.11511.pdf)
+
+[![Streaming Eval](https://storage.googleapis.com/tf_model_garden/vision/movinet/artifacts/movinet_stream_eval.png)](https://arxiv.org/pdf/2103.11511.pdf)
+
+## History
+
+- Initial Commit.
+
+## Authors and Maintainers
+
+* Dan Kondratyuk ([@hyperparticle](https://github.com/hyperparticle))
+* Liangzhe Yuan ([@yuanliangzhe](https://github.com/yuanliangzhe))
+* Yeqing Li ([@yeqingli](https://github.com/yeqingli))
+
+## Table of Contents
+
+- [Requirements](#requirements)
+- [Results and Pretrained Weights](#results-and-pretrained-weights)
+  - [Kinetics 600](#kinetics-600)
+- [Training and Evaluation](#training-and-evaluation)
+- [References](#references)
+- [License](#license)
+- [Citation](#citation)
+
+## Requirements
+
+[![TensorFlow 2.4](https://img.shields.io/badge/TensorFlow-2.1-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0)
+[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB?logo=python)](https://www.python.org/downloads/release/python-360/)
+
+To install requirements:
+
+```shell
+pip install -r requirements.txt
+```
+
+## Results and Pretrained Weights
+
+[![TensorFlow Hub](https://img.shields.io/badge/TF%20Hub-Models-FF6F00?logo=tensorflow)](https://tfhub.dev/google/collections/movinet)
+[![TensorBoard](https://img.shields.io/badge/TensorBoard-dev-FF6F00?logo=tensorflow)](https://tensorboard.dev/experiment/Q07RQUlVRWOY4yDw3SnSkA/)
+
+### Kinetics 600
+
+[![MoViNet Comparison](https://storage.googleapis.com/tf_model_garden/vision/movinet/artifacts/movinet_comparison.png)](https://arxiv.org/pdf/2103.11511.pdf)
+
+[tensorboard.dev summary](https://tensorboard.dev/experiment/Q07RQUlVRWOY4yDw3SnSkA/)
+of training runs across all models.
+
+The table below summarizes the performance of each model and provides links to
+download pretrained models. All models are evaluated on single clips with the
+same resolution as training.
+
+Streaming MoViNets will be added in the future.
+
+| Model Name | Top-1 Accuracy | Top-5 Accuracy | GFLOPs\* | Checkpoint | TF Hub SavedModel |
+|------------|----------------|----------------|----------|------------|-------------------|
+| MoViNet-A0-Base | 71.41 | 90.91 | 2.7 | [checkpoint (12 MiB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a0/base/kinetics-600/classification/) |
+| MoViNet-A1-Base | 76.01 | 93.28 | 6.0 | [checkpoint (18 MiB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a1/base/kinetics-600/classification/) |
+| MoViNet-A2-Base | 78.03 | 93.99 | 10 | [checkpoint (20 MiB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a2/base/kinetics-600/classification/) |
+| MoViNet-A3-Base | 81.22 | 95.35 | 57 | [checkpoint (29 MiB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a3/base/kinetics-600/classification/) |
+| MoViNet-A4-Base | 82.96 | 95.98 | 110 | [checkpoint (44 MiB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a4/base/kinetics-600/classification/) |
+| MoViNet-A5-Base | 84.22 | 96.36 | 280 | [checkpoint (72 MiB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a5/base/kinetics-600/classification/) |
+
+\*GFLOPs per video on Kinetics 600.
+
+## Training and Evaluation
+
+Please check out our [Colab Notebook](https://colab.research.google.com/github/tensorflow/models/tree/master/official/vision/beta/projects/movinet/movinet_tutorial.ipynb)
+to get started with MoViNets.
+
+Run this command line for continuous training and evaluation.
+
+```shell
+MODE=train_and_eval  # Can also be 'train'
+CONFIG_FILE=official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml
+python3 official/vision/beta/projects/movinet/train.py \
+    --experiment=movinet_kinetics600 \
+    --mode=${MODE} \
+    --model_dir=/tmp/movinet/ \
+    --config_file=${CONFIG_FILE} \
+    --params_override="" \
+    --gin_file="" \
+    --gin_params="" \
+    --tpu="" \
+    --tf_data_service=""
+```
+
+Run this command line for evaluation.
+
+```shell
+MODE=eval  # Can also be 'eval_continuous' for use during training
+CONFIG_FILE=official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml
+python3 official/vision/beta/projects/movinet/train.py \
+    --experiment=movinet_kinetics600 \
+    --mode=${MODE} \
+    --model_dir=/tmp/movinet/ \
+    --config_file=${CONFIG_FILE} \
+    --params_override="" \
+    --gin_file="" \
+    --gin_params="" \
+    --tpu="" \
+    --tf_data_service=""
+```
+
+## References
+
+- [Kinetics Datasets](https://deepmind.com/research/open-source/kinetics)
+- [MoViNets (Mobile Video Networks)](https://arxiv.org/abs/2103.11511)
+
+## License
+
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+
+This project is licensed under the terms of the **Apache License 2.0**.
+
+## Citation
+
+If you want to cite this code in your research paper, please use the following
+information.
+
+```
+@article{kondratyuk2021movinets,
+  title={MoViNets: Mobile Video Networks for Efficient Video Recognition},
+  author={Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Matthew Brown, and Boqing Gong},
+  journal={arXiv preprint arXiv:2103.11511},
+  year={2021}
+}
+```
--- a/official/vision/beta/projects/movinet/configs/movinet.py
+++ b/official/vision/beta/projects/movinet/configs/movinet.py
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Definitions for MoViNet structures.
+
+Reference: "MoViNets: Mobile Video Networks for Efficient Video Recognition"
+https://arxiv.org/pdf/2103.11511.pdf
+
+MoViNets are efficient video classification networks that are part of a model
+family, ranging from the smallest model, MoViNet-A0, to the largest model,
+MoViNet-A6. Each model has various width, depth, input resolution, and input
+frame-rate associated with them. See the main paper for more details.
+"""
+
+import dataclasses
+
+from official.core import config_definitions as cfg
+from official.core import exp_factory
+from official.modeling import hyperparams
+from official.vision.beta.configs import backbones_3d
+from official.vision.beta.configs import common
+from official.vision.beta.configs.google import video_classification
+
+
+@dataclasses.dataclass
+class Movinet(hyperparams.Config):
+  """Backbone config for Base MoViNet."""
+  model_id: str = 'a0'
+  causal: bool = False
+  use_positional_encoding: bool = False
+  # Choose from ['3d', '2plus1d', '3d_2plus1d']
+  # 3d: default 3D convolution
+  # 2plus1d: (2+1)D convolution with Conv2D (2D reshaping)
+  # 3d_2plus1d: (2+1)D convolution with Conv3D (no 2D reshaping)
+  conv_type: str = '3d'
+  stochastic_depth_drop_rate: float = 0.2
+
+
+@dataclasses.dataclass
+class MovinetA0(Movinet):
+  """Backbone config for MoViNet-A0.
+
+  Represents the smallest base MoViNet searched by NAS.
+
+  Reference: https://arxiv.org/pdf/2103.11511.pdf
+  """
+  model_id: str = 'a0'
+
+
+@dataclasses.dataclass
+class MovinetA1(Movinet):
+  """Backbone config for MoViNet-A1."""
+  model_id: str = 'a1'
+
+
+@dataclasses.dataclass
+class MovinetA2(Movinet):
+  """Backbone config for MoViNet-A2."""
+  model_id: str = 'a2'
+
+
+@dataclasses.dataclass
+class MovinetA3(Movinet):
+  """Backbone config for MoViNet-A3."""
+  model_id: str = 'a3'
+
+
+@dataclasses.dataclass
+class MovinetA4(Movinet):
+  """Backbone config for MoViNet-A4."""
+  model_id: str = 'a4'
+
+
+@dataclasses.dataclass
+class MovinetA5(Movinet):
+  """Backbone config for MoViNet-A5.
+
+  Represents the largest base MoViNet searched by NAS.
+  """
+  model_id: str = 'a5'
+
+
+@dataclasses.dataclass
+class MovinetT0(Movinet):
+  """Backbone config for MoViNet-T0.
+
+  MoViNet-T0 is a smaller version of MoViNet-A0 for even faster processing.
+  """
+  model_id: str = 't0'
+
+
+@dataclasses.dataclass
+class Backbone3D(backbones_3d.Backbone3D):
+  """Configuration for backbones.
+
+  Attributes:
+    type: 'str', type of backbone be used, on the of fields below.
+    movinet: movinet backbone config.
+  """
+  type: str = 'movinet'
+  movinet: Movinet = Movinet()
+
+
+@dataclasses.dataclass
+class MovinetModel(video_classification.VideoClassificationModel):
+  """The MoViNet model config."""
+  model_type: str = 'movinet'
+  backbone: Backbone3D = Backbone3D()
+  norm_activation: common.NormActivation = common.NormActivation(
+      activation='swish',
+      norm_momentum=0.99,
+      norm_epsilon=1e-3,
+      use_sync_bn=True)
+  output_states: bool = False
+
+
+@exp_factory.register_config_factory('movinet_kinetics600')
+def movinet_kinetics600() -> cfg.ExperimentConfig:
+  """Video classification on Videonet with MoViNet backbone."""
+  exp = video_classification.video_classification_kinetics600()
+  exp.task.train_data.dtype = 'bfloat16'
+  exp.task.validation_data.dtype = 'bfloat16'
+
+  model = MovinetModel()
+  exp.task.model = model
+
+  return exp
--- a/official/vision/beta/projects/movinet/configs/movinet_test.py
+++ b/official/vision/beta/projects/movinet/configs/movinet_test.py
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Tests for movinet video classification."""
+
+from absl.testing import parameterized
+import tensorflow as tf
+
+from official.core import config_definitions as cfg
+from official.core import exp_factory
+from official.vision.beta.configs import video_classification as exp_cfg
+from official.vision.beta.projects.movinet.configs import movinet
+
+
+class MovinetConfigTest(tf.test.TestCase, parameterized.TestCase):
+
+  @parameterized.parameters(
+      ('movinet_kinetics600',),)
+  def test_video_classification_configs(self, config_name):
+    config = exp_factory.get_exp_config(config_name)
+    self.assertIsInstance(config, cfg.ExperimentConfig)
+    self.assertIsInstance(config.task, exp_cfg.VideoClassificationTask)
+    self.assertIsInstance(config.task.model, movinet.MovinetModel)
+    self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig)
+    config.task.train_data.is_training = None
+    with self.assertRaises(KeyError):
+      config.validate()
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A0 backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves 71.65% Top-1 accuracy.
+# http://mldash/experiments/4591693621833944103
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a0'
+        stochastic_depth_drop_rate: 0.2
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 50
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    random_stride_range: 1
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 192
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'autoaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 50
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 192
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_cpu_local.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_cpu_local.yaml
+# Video classification on Kinetics-600 using MoViNet-A0 backbone.
+# --experiment_type=movinet_kinetics600
+
+runtime:
+  distribution_strategy: 'mirrored'
+  mixed_precision_dtype: 'float32'
+task:
+  model:
+    backbone:
+      movinet:
+        model_id: 'a0'
+    norm_activation:
+      use_sync_bn: false
+    dropout_rate: 0.5
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 4
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    random_stride_range: 0
+    global_batch_size: 2
+    dtype: 'float32'
+    shuffle_buffer_size: 32
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 4
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 2
+    dtype: 'float32'
+    drop_remainder: true
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 0.8
+        decay_steps: 42104
+    warmup:
+      linear:
+        warmup_steps: 1053
+  train_steps: 10
+  validation_steps: 10
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_stream_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_stream_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A0-Stream backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves 69.56% Top-1 accuracy.
+# http://mldash/experiments/6696393165423234453
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a0'
+        causal: true
+        stochastic_depth_drop_rate: 0.2
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 50
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    random_stride_range: 0
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 192
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'autoaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 50
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 192
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A1 backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves 76.63% Top-1 accuracy.
+# http://mldash/experiments/6004897086445740406
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a1'
+        stochastic_depth_drop_rate: 0.2
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 50
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    random_stride_range: 1
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 192
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'autoaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 50
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 192
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a1_stream_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a1_stream_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A1-Stream backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves x% Top-1 accuracy.
+# http://mldash/experiments/
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a1'
+        causal: true
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+    stochastic_depth_rate: 0.2
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 50
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    random_stride_range: 0
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 192
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'autoaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 50
+    - 172
+    - 172
+    - 3
+    temporal_stride: 5
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 192
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A2 backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves 78.62% Top-1 accuracy.
+# http://mldash/experiments/7122292520723231204
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a2'
+        stochastic_depth_drop_rate: 0.2
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 50
+    - 224
+    - 224
+    - 3
+    temporal_stride: 5
+    random_stride_range: 1
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 256
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'autoaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 50
+    - 224
+    - 224
+    - 3
+    temporal_stride: 5
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 256
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a2_stream_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a2_stream_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A2-Stream backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves 78.40% Top-1 accuracy.
+# http://mldash/experiments/3089118812758230318
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a2'
+        causal: true
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+    stochastic_depth_rate: 0.2
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 50
+    - 224
+    - 224
+    - 3
+    temporal_stride: 5
+    random_stride_range: 0
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 256
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'autoaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 50
+    - 224
+    - 224
+    - 3
+    temporal_stride: 5
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 256
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A3 backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves 81.79% Top-1 accuracy.
+# http://mldash/experiments/1893120685388985498
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a3'
+        stochastic_depth_drop_rate: 0.2
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 64
+    - 256
+    - 256
+    - 3
+    temporal_stride: 2
+    random_stride_range: 1
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 288
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'autoaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 120
+    - 256
+    - 256
+    - 3
+    temporal_stride: 2
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 288
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a3_stream_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a3_stream_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A3-Stream backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves x% Top-1 accuracy.
+# http://mldash/experiments/
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a3'
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+    stochastic_depth_rate: 0.2
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 64
+    - 256
+    - 256
+    - 3
+    temporal_stride: 2
+    random_stride_range: 0
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 288
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'autoaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 120
+    - 256
+    - 256
+    - 3
+    temporal_stride: 2
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 288
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A4 backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves 83.48% Top-1 accuracy.
+# http://mldash/experiments/8781090241570014456
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a4'
+        stochastic_depth_drop_rate: 0.2
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 32
+    - 290
+    - 290
+    - 3
+    temporal_stride: 3
+    random_stride_range: 1
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 320
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'autoaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 80
+    - 290
+    - 290
+    - 3
+    temporal_stride: 3
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 320
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-A5 backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves 84.00% Top-1 accuracy.
+# http://mldash/experiments/2864919645986275853
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 'a5'
+        stochastic_depth_drop_rate: 0.2
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 32
+    - 320
+    - 320
+    - 3
+    temporal_stride: 2
+    random_stride_range: 1
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 368
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+    aug_type: 'randaug'
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 120
+    - 320
+    - 320
+    - 3
+    temporal_stride: 2
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 32
+    min_image_size: 368
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_t0_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_t0_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-T0 backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves 68.40% Top-1 accuracy.
+# http://mldash/experiments/3958407113491615048
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 't0'
+        stochastic_depth_drop_rate: 0.2
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 25
+    - 160
+    - 160
+    - 3
+    temporal_stride: 10
+    random_stride_range: 0
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 176
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 25
+    - 160
+    - 160
+    - 3
+    temporal_stride: 10
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 176
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/configs/yaml/movinet_t0_stream_k600_8x8.yaml
+++ b/official/vision/beta/projects/movinet/configs/yaml/movinet_t0_stream_k600_8x8.yaml
+# Video classification on Kinetics-600 using MoViNet-T0-Stream backbone.
+# --experiment_type=movinet_kinetics600
+# Achieves x% Top-1 accuracy.
+# http://mldash/experiments/
+
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  losses:
+    l2_weight_decay: 0.00003
+    label_smoothing: 0.1
+  model:
+    backbone:
+      movinet:
+        model_id: 't0'
+    norm_activation:
+      use_sync_bn: true
+    dropout_rate: 0.5
+    stochastic_depth_rate: 0.2
+  train_data:
+    name: kinetics600
+    variant_name: rgb
+    feature_shape: !!python/tuple
+    - 25
+    - 160
+    - 160
+    - 3
+    temporal_stride: 10
+    random_stride_range: 0
+    global_batch_size: 1024
+    dtype: 'bfloat16'
+    shuffle_buffer_size: 1024
+    min_image_size: 176
+    aug_max_area_ratio: 1.0
+    aug_max_aspect_ratio: 2.0
+    aug_min_area_ratio: 0.08
+    aug_min_aspect_ratio: 0.5
+  validation_data:
+    name: kinetics600
+    feature_shape: !!python/tuple
+    - 25
+    - 160
+    - 160
+    - 3
+    temporal_stride: 10
+    num_test_clips: 1
+    num_test_crops: 1
+    global_batch_size: 64
+    min_image_size: 176
+    dtype: 'bfloat16'
+    drop_remainder: false
+trainer:
+  optimizer_config:
+    learning_rate:
+      cosine:
+        initial_learning_rate: 1.8
+        decay_steps: 85785
+    warmup:
+      linear:
+        warmup_steps: 2145
+    optimizer:
+      type: 'rmsprop'
+      rmsprop:
+        rho: 0.9
+        momentum: 0.9
+        epsilon: 1.0
+        clipnorm: 1.0
+  train_steps: 85785
+  steps_per_loop: 500
+  summary_interval: 500
+  validation_interval: 500
--- a/official/vision/beta/projects/movinet/export_saved_model.py
+++ b/official/vision/beta/projects/movinet/export_saved_model.py
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Lint as: python3
+r"""Exports models to tf.saved_model.
+
+Export example:
+
+```shell
+python3 export_saved_model.py \
+  --output_path=/tmp/movinet/ \
+  --model_id=a0 \
+  --causal=True \
+  --use_2plus1d=False \
+  --num_classes=600 \
+  --checkpoint_path=""
+```
+
+To use an exported saved_model in various applications:
+
+```python
+import tensorflow as tf
+import tensorflow_hub as hub
+
+saved_model_path = ...
+
+inputs = tf.keras.layers.Input(
+    shape=[None, None, None, 3],
+    dtype=tf.float32)
+
+encoder = hub.KerasLayer(saved_model_path, trainable=True)
+outputs = encoder(inputs)
+
+model = tf.keras.Model(inputs, outputs)
+
+example_input = tf.ones([1, 8, 172, 172, 3])
+outputs = model(example_input, states)
+```
+"""
+
+from typing import Sequence
+
+from absl import app
+from absl import flags
+import tensorflow as tf
+
+from official.vision.beta.projects.movinet.modeling import movinet
+from official.vision.beta.projects.movinet.modeling import movinet_model
+
+flags.DEFINE_string(
+    'output_path', '/tmp/movinet/',
+    'Path to saved exported saved_model file.')
+flags.DEFINE_string(
+    'model_id', 'a0', 'MoViNet model name.')
+flags.DEFINE_bool(
+    'causal', False, 'Run the model in causal mode.')
+flags.DEFINE_bool(
+    'use_2plus1d', False, 'Use (2+1)D features instead of 3D.')
+flags.DEFINE_integer(
+    'num_classes', 600, 'The number of classes for prediction.')
+flags.DEFINE_string(
+    'checkpoint_path', '',
+    'Checkpoint path to load. Leave blank for default initialization.')
+
+FLAGS = flags.FLAGS
+
+
+def main(argv: Sequence[str]) -> None:
+  if len(argv) > 1:
+    raise app.UsageError('Too many command-line arguments.')
+
+  # Use dimensions of 1 except the channels to export faster,
+  # since we only really need the last dimension to build and get the output
+  # states. These dimensions will be set to `None` once the model is built.
+  input_shape = [1, 1, 1, 1, 3]
+
+  backbone = movinet.Movinet(
+      FLAGS.model_id, causal=FLAGS.causal, use_2plus1d=FLAGS.use_2plus1d)
+  model = movinet_model.MovinetClassifier(
+      backbone, num_classes=FLAGS.num_classes, output_states=FLAGS.causal)
+  model.build(input_shape)
+
+  if FLAGS.checkpoint_path:
+    model.load_weights(FLAGS.checkpoint_path)
+
+  if FLAGS.causal:
+    # Call the model once to get the output states. Call again with `states`
+    # input to ensure that the inputs with the `states` argument is built
+    _, states = model(dict(image=tf.ones(input_shape), states={}))
+    _, states = model(dict(image=tf.ones(input_shape), states=states))
+
+    input_spec = tf.TensorSpec(
+        shape=[None, None, None, None, 3],
+        dtype=tf.float32,
+        name='inputs')
+
+    state_specs = {}
+    for name, state in states.items():
+      shape = state.shape
+      if len(state.shape) == 5:
+        shape = [None, state.shape[1], None, None, state.shape[-1]]
+      new_spec = tf.TensorSpec(shape=shape, dtype=state.dtype, name=name)
+      state_specs[name] = new_spec
+
+    specs = (input_spec, state_specs)
+
+    # Define a tf.keras.Model with custom signatures to allow it to accept
+    # a state dict as an argument. We define it inline here because
+    # we first need to determine the shape of the state tensors before
+    # applying the `input_signature` argument to `tf.function`.
+    class ExportStateModule(tf.Module):
+      """Module with state for exporting to saved_model."""
+
+      def __init__(self, model):
+        self.model = model
+
+      @tf.function(input_signature=[input_spec])
+      def __call__(self, inputs):
+        return self.model(dict(image=inputs, states={}))
+
+      @tf.function(input_signature=[input_spec])
+      def base(self, inputs):
+        return self.model(dict(image=inputs, states={}))
+
+      @tf.function(input_signature=specs)
+      def stream(self, inputs, states):
+        return self.model(dict(image=inputs, states=states))
+
+    module = ExportStateModule(model)
+
+    tf.saved_model.save(module, FLAGS.output_path)
+  else:
+    _ = model(tf.ones(input_shape))
+    tf.keras.models.save_model(model, FLAGS.output_path)
+
+  print(' ----- Done. Saved Model is saved at {}'.format(FLAGS.output_path))
+
+
+if __name__ == '__main__':
+  app.run(main)
--- a/official/vision/beta/projects/movinet/modeling/movinet.py
+++ b/official/vision/beta/projects/movinet/modeling/movinet.py
--- a/official/vision/beta/projects/movinet/modeling/movinet_layers.py
+++ b/official/vision/beta/projects/movinet/modeling/movinet_layers.py